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REAL-TIME SPATIO-TEMPORAL 
COHERENCE ESTIMATION FOR 
AUTONOMOUS MODE IDENTIFICATION 
AND INVARIANCE TRACKING 

CROSS REFERENCE TO RELATED 
APPLICATIONS 

The present application claims priority from the following 
U.S. Provisional application: U.S. Application No. 60/274, 
536, filed Mar. 8, 2001 and titled “Exception Analysis for 10 
Multimissions” which is incorporated herein by reference 
for all purposes. The present invention is related to concur- 
rently filed and commonly owned U.S. Non-provisional 
application Ser. No. 10/092,491 entitled “Exception Analy- 
sis for Multimissions” and is herein incorporated by refer- 15 
ence for all purposes. 

STATEMENT AS TO RIGHTS TO INVENTIONS 
MADE UNDER FEDERALLY SPONSORED 

RESEARCH OR DEVELOPMENT 20 

The U.S. Government has certain rights in this invention 
pursuant to Grant Nos. NPO 21025, NPO 20803, NPO 
20829, NPO 20827 and NPO 21126 awarded by NASA. 

BACKGROUND OF THE INVENTION 

The invention relates generally to system health 
assessment, and more specifically to diagnosis and progno- 
sis of system performance, errant system conditions, and 
abnormal system behavior in an instrumented system. 

Complex systems typically cannot tolerate a long down 30 
time and so need to be constantly monitored. For example, 
a semiconductor fabrication facility cannot afford to be 
offline for an extended period of time. In addition to the loss 
of wafer production, it takes considerable time to restart the 
line. A patient monitoring station must have high reliability 35 
in order to be useful. Spacecraft must be constantly moni- 
tored in order to detect faults and to detect trends in system 
operation which may lead to faults, so that proactive cor- 
rective action can be taken. 

It is also important to avoid false positive indications of 40 
a system error. It is both costly and time consuming to bring 
a system down, replace or repair the supposed error, and 
bring the system back up only to discover that the incorrect 
remedy was taken. 

As advances in technology permit higher degrees of 45 
integration both at the component level and at the system 
level, systems become increasingly more complex. 
Consequently, improvements for determining system per- 
formance and assessing system health are needed, to 
adequately detect system faults and operational trends that 50 
might lead to system failure. 

BRIEF SUMMARY OF THE INVENTION 

A method and apparatus for diagnosis and prognosis of 
faults in accordance with embodiments of the invention is 55 
based on sensor data and discrete data. In an embodiment of 
the invention, anomaly detection is made based on a statis- 
tical analysis of time -correlated sensor data and on mode of 
operation of the system as determined by the discrete data. 

In another embodiment of the invention, anomaly detection 60 
is further based on a statistical analysis of individual sensor 
data. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The teachings of the present invention can be readily 65 
understood by considering the following detailed descrip- 
tion in conjunction with the accompanying drawings: 
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FIG. 1A illustrates the present invention in the context of 
its general operating environment; 

FIG. IB shows a typical application of the invention; 

FIG. 2 shows a high level block diagram of an illustrative 
example of the present invention; 

FIG. 3 is a high level block diagram of the coherence - 
based fault detector illustrated in FIG. 2; 

FIG. 4 shows a typical distribution of the auto regressive 
parameters a,- of Eq. Eq. (4.2.3); 

FIG. 5 is a high level block diagram illustrating an 
example of the dynamical anomaly detector illustrated in 
FIG. 2; 

FIG. 6 shows a schematic of simple gas turbine as an 
example illustrating the present invention; 

FIGS. 7A and 7B are high level block diagrams illustrat- 
ing an example of the model filter component shown in FIG. 
2 ; 

FIG. 8 illustrates an example of ensemble of possible time 
series; 

FIG. 9 illustrates an example of a time series forecast in 
the form of a predicted mean and probability distribution 
produced by averaging the time series ensemble of FIG. 8; 

FIG. 10 is a high level block diagram illustrating an 
example of an embodiment of the prognostic assessment 
module shown in FIG. 2; 

FIG. 11 highlights the symbolic components of FIG. 2; 

FIG. 12 is a high level block diagram illustrating an 
example of an embodiment of the symbolic data model 
module of FIG. 2; 

FIG. 13 is a high level block diagram illustrating an 
example of an embodiment of the predictive comparison 
module of FIG. 2; 

FIG. 14 is a high level block diagram illustrating an 
example of an embodiment of the causal system model 
module of FIG. 2; and 

FIG. 15 is a high level block diagram illustrating an 
example of an embodiment of the interpretation module of 
FIG. 2. 

DESCRIPTION OF THE SPECIFIC 
EMBODIMENTS 

1.0 Introduction 

BEAM stands for Beacon-based Exception Analysis for 
Multimissions, an end-to-end method of data analysis 
intended for real-time fault detection and characterization. 
Its intent was to provide a generic system analysis capability 
for potential application to deep space probes and other 
highly automated systems. Such systems are typified by 
complex and often unique architectures, high volumes of 
data collection, limited bandwidth, and a critical need for 
flexible and timely decision abilities. 

Two important features make BEAM standout among the 
various fault-protection technologies that have been 
advanced. The first is its broad range of applicability. This 
approach has been used with sensor and computed data of 
radically different types, on numerous systems, without 
detailed system knowledge or a priori training. Separate 
components are included to treat time-varying signals and 
discrete data, and to interpret the combination of results. 

The second is its ability to detect, and with few exceptions 
correctly resolve, faults for which the detector has not been 
trained. This flexibility is of prime importance in systems 
with low temporal margins and those with complex envi- 
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ronmental interaction. This ability also comes with few 
requirements in terms of detector training. 

We will begin with the motivation and general problem 
statement. Following this, we will examine the overall 
architecture at a block level. We will then examine the 
individual component technologies in depth, including the 
governing equations and such issues as training, interface, 
and range of applicability. 

2.0 History 

BEAM was initially proposed to investigate a new system 
health analysis method. The purpose of BEAM was to fill a 
critical role in supporting the new Beacon method of opera- 
tions. Beacon operation can be described as a condition- 
specific mode of communication between a remote system 
and ground personnel, where lengthy transmissions are sent 
to the operators only where unexpected spacecraft behavior 
warrants the expense. Instead of the current method, where 
large amounts of data are downlinked on a regular basis, the 
Beacon paradigm uses two transmitters — one high-gain 
transmitter as before and a new very low-gain transmitter, 
the “beacon.” The beacon transmits one of four tones at all 
times, indicating whether or not the spacecraft needs to 
dump engineering data. A first “tone” would be sent during 
normal operation to signify that all is well. If there are 
significant faults detected, then other tones are transmitted to 
indicate that data explaining the problem needs to be 
retrieved. 

Such a paradigm is highly desirable if downlink capacity 
is limited. Although the number of NASA missions is 
increasing dramatically, along with their complexity, the 
amount of available communications antenna resources 
remains fairly constant. The Beacon paradigm more effi- 
ciently shares such limited resources. 

Beacon operation requires several ingredients, and is 
therefore only practicable if a number of hurdles can be 
overcome. Central to the method is the ability of a spacecraft 
to perform an accurate, timely, and verifiable self-diagnosis. 
Such a self-diagnosis must not only provide a safe operating 
envelope, but must perform comparably at worst to the 
system experts in the control room. A system that is 
insensitive, generates false alarms, or requires oversight will 
not be effective, because such inefficiencies will be ampli- 
fied in a “streamlined” process. 

The basic premise of BEAM was the following: Construct 
a generic strategy to characterize a system from all available 
observations, and then train this characterization with 
respect to normal phases of operation. In this regard the 
BEAM engine functions much as a human operator does — 
through experience and other available resources (known 
architecture, models, etc.) an allowed set of behavior is 
“learned” and deviations from this are noted and examined. 
Such an approach should be applied as a complement to 
simplistic but reliable monitors and alarms found in nearly 
all instrumented systems. The approach should also be 
constructed such that information products can be used to 
drive autonomic decisions, or to support the decisions of 
human operators. In other words, the system must allow for 
intervention and aid it wherever possible. If this is not done, 
it is difficult for spacecraft experts to gain trust in the system 
and the benefits of beacon operation or any similar cost- 
saving approach will be doomed from the outset. 

Philosophically, the BEAM approach is similar to that of 
two familiar advanced techniques, namely Neural Networks 
and Expert Systems. Each of these has its particular 
strengths and weaknesses: 
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1. Neural networks are effective means of sensor fusion 
and separation between classes of data. However, they 
are difficult to adapt to cases where unknown classes 
exist. Furthermore, identifying the source of fault indi- 

5 cated by a neural network is not usually possible (low 
traceability). 

2. Expert Systems are effective and adaptable means of 
applying logical rules and decisions, but are only as 
good as the rules that are available to them. Such 

10 systems also perform badly in the face of unknown 
fault classes and are expensive to train. 

On the other hand, each of these approaches does have 
powerful benefits as well, and we will see how they have 
been blended with our novel techniques to take full advan- 
15 tage of available technology. 

Aspects of the invention as described by the illustrated 
example of the disclosed architectural description include: 

1. direct insertion of physical models (gray box); 

2Q 2. integration of symbolic reasoning components; 

3. statistical and stochastic modeling of individual sig- 
nals; 

4. trending to failure for individual signals and cross- 
signal features; and 

25 5. expert system as enumerator of results. 

BEAM is a specific system embodying all the aspects of 
the invention as disclosed herein. It represents the inventors’ 
best mode for practicing the invention at the time of the 
invention. The “beacon” aspect of the BEAM system is 
30 simply the communication front-end which relies on a 
paradigm suitable for systems where communication band- 
width is limited. It will be appreciated that the various 
embodiments of the present invention constitute the core 
components of the BEAM architecture, which do not rely on 
35 beacon-based communications. However, reference to 
BEAM will be made throughout the discussion which 
follows, for the simple reason that the BEAM architecture 
contains the various embodiments of the present invention. 

One of ordinary skill in the art will readily appreciate that 
40 any system can be adapted with aspects of the present 
invention to realize the benefits and advantages of fault 
detection and assessment in accordance with embodiments 
of the invention. Systems where teams of experts who 
review the engineering data on a regular basis and who 
45 typically oversee the analysis process are well suited for a 
fault detection and characterization system according to 
teachings of the invention. Complex machinery that rely 
upon internal sensing and expert operators to provide oper- 
ating margins for reasons of safety or efficiency can benefit 
50 from the invention. System maintenance can be greatly 
simplified by incorporating various aspects of the invention 
into a maintenance system. The invention is particularly well 
suited to systems that are fully automated and require 
advanced fault detection and isolation techniques. 

55 The generalized block diagram shown in FIG. 1A illus- 
trates the general applicability of the present invention in a 
system. A monitored system 100 includes a target system 
102 for which its operational health is to be monitored. The 
target system 102 has associated with it various sources 104 
60 of operational data relating to its operation. This includes 
information such as data from physical sensors (e.g., 
transducers), data produced by software running on the 
target system, switch settings, and so on. 

The operational data is conveyed over a channel 112 to a 
65 data analysis engine 106. The data analysis engine com- 
prises a computing system for processing the information in 
accordance with various embodiments of the invention. The 
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computing system can be any known computing 
environment, comprising any of a number of computer 
architectures and configurations. 

FIG. IB illustrates and example of a complex system 
which incorporates the present invention. A satellite device 
102' includes data sources 104' which produce operational 
data. Generally, the data is communicated by a downlink 
signal 112' to ground stations 106', where the information is 
subjected to the analytical techniques of the present inven- 
tion. In practice, some aspects of the present invention can 
be embodied in the satellite device itself in addition to the 
ground station computers. The downlink signal comprises a 
beacon component and a data stream component. The bea- 
con component is in effect during most of the time, trans- 
mitting an appropriate beacon tone. However, when a prob- 
lem arises, the downlink signal begins transmitting 
appropriate data to the ground stations computers. 

3.0 Overall Architecture 

At the simplest level of abstraction, BEAM is software 
which takes sensor data and other operational data as input 
and reports fault status as output. Implementation of this 
software is dependent on the application, but a typical 
application would have a system with a number of indi- 
vidual components, each of which reports health or perfor- 
mance data to a local computer, whereupon a local BEAM 
manager draws a conclusion during runtime and forwards 
the results to a decision maker. We must consider the 
physical makeup of the device in question when deciding 
how to compartmentalize the diagnosis. Consideration must 
be made for computing power, communication and data 
buses, and the natural divisions present in the system. To 
accommodate such a wide range of possibilities, the com- 
putational engine of BEAM itself is highly adaptable with 
respect to subsystem size and complexity. 

For each single compartment or subsystem, we can expect 
to receive four types of data: 

1. Discrete status variables changing in time — mode 
settings, switch positions, health bits, etc. 

2 . Real-valued sensor data varying at fixed rates — 
performance sensors or dedicated diagnostic sensors. 
The sensor data, in the context of the present invention, 
includes any data relating to the physical condition of 
the machine. For example, sensor data may include 
data from force transducers, chemical sensors, thermal 
sensors, accelerometers, rotational sensors, and so on. 

3. Command information — typically discrete as in 1. 

4. Fixed parameters — varying only when commanded to 
change but containing important state information. 

These types of data are all of value but in different ways. 
Status variables and commands are useful to a symbolic 
model. Commands and fixed parameters are useful to a 
physical system model. Time-varying sensor data is useful 
for signal processing approaches. In order to study each and 
combine results, the following architecture in accordance 
with the invention is set forth as presented in FIG. 2. 

A few notes about the architecture are in order before we 
consider the individual descriptions of its components. 
Specifically, we should consider the data flow, which is 
somewhat complicated: 

1. Fixed parameters and command information are input 
to the specific system models (if any). These are the 
Symbolic Model and the Physical Model components. 
These data will not be propagated further and influence 
other components only through the model outputs. 

2 . Discrete variables will only be propagated through the 
symbolic components as indicated in FIG. 2 by the 
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heavy lines. The effect of the symbolic components on 
the signal processing components is limited to mode 
determination as indicated. 

3. Time-varying quantities are separated into two groups 
5 as part of the training process. Specifically, signals with 

high degrees of correlation to others, or those which are 
not expected to uniquely indicate a severe fault, are 
only passed to the Coherence Analysis components. 
Signals that may uniquely indicate a fault, along with 
10 those already flagged as faulty by the coherence 
analysis, are also passed through the feature extraction 
components. 

4. The split between time -varying signals described in 
note 3 above is a computational efficiency consider- 
ation and reflects general philosophy of operation, but 

" is not essential. Given adequate resources, there is 
nothing preventing all time-varying signals from being 
sent to both types of signal analysis at all times. 

Following is a summary of the duties of each component. 
Further clarifying details can be found in Appendix ABC 
20 attached herewith. 

Model Filter: The model filter 202 combines sensor data 
with physical model predictions (run in real-time). The 
model filter, also referred to as the Gray Box, combines 
sensor data with physical model predictions (run in real- 
25 time). We are interested in augmenting sensor data with 
intuition about the operation of the system. The inclusion of 
physical models where available is the most efficient means 
of incorporating domain knowledge into a signal-based 
processing approach. 

30 The usual methods of signal processing represent “Black 
Box” strategies; i.e. nothing is known about the internal 
governing equations of a system. Incidentally, the Dynami- 
cal Invariant Anomaly Detector component of an embodi- 
ment of the invention is another a black box technique. 
35 However, it features analytical methods in accordance with 
the teachings of the invention and so has merit beyond being 
a black box device. Such linear approaches are effective in 
general, but there are profound benefits to simultaneous 
consideration of sensor and physical information. The oppo- 
40 site perspective would be a “White Box” strategy, where a 
complete and accurate physical simulation was available for 
comparison to captured data. This case is desirable but rarely 
practical. In nearly all cases we must make do with a 
degraded or simplified model, either because of system 
45 complexity or computational limits. The “Gray Box” 
method serves to make use of whatever models are 
available, as we will briefly explore in this section. 

Any theoretical dynamical model includes two types of 
components: those directly describing the phenomena asso- 
50 ciated with the primary function of the system (such as the 
effect of torque exerted on the turbine shaft on rotor speed), 
and those representing secondary effects (such as frictional 
losses, heat losses, etc.). The first type is usually well 
understood and possesses a deterministic analytical 
55 structure, and therefore its behavior is fully predictable. On 
the other hand, the second type may be understood only at 
a much more complex level of description (i.e. at the 
molecular level) and cannot be simply incorporated into a 
theoretical model. In fact, some components may be poorly 
60 understood and lack any analytical description, such as 
viscosity of water in microgravity. The main idea of this 
approach is to filter out contributions that are theoretically 
predictable from the sensor data and focus on the compo- 
nents whose theoretical prediction is lacking. The filtering is 
65 performed by the theoretical model itself. 

If we assume that the theoretical model is represented by 
a system of differential equations, known physical processes 
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will be described with state variables and theoretical func- 
tions. But we will also have additional terms that describe 
phenomena for which a full mathematical description is 
unavailable or too complex. Examples of this include fric- 
tion in bearings, material viscosity, and secondary effects 
such as oscillations or flutter in mechanical systems. These 
leftover terms represent the unknown space of our partial 
model. 

If we substitute sensor data into the theoretical model, so 
long as the actual system performs as expected, there will be 
no departure from the model. However, an abnormality in 
performance will alter the behavior of the “leftover” terms. 

In general, we can treat the abnormality as the result of a 
stochastic process. If the abnormality is small compared to 
the modeled components of the system, it will suffice to 
assign some confidence interval for fault detection. 
However, if the accuracy of the model is poor, we must treat 
the leftover terms using stochastic or parameter estimation 
models, as described in following components. 

Compared to a straightforward signal analysis method, 
where highly stable dynamical models are available, the 
“black box” approach is not only more laborious, but is also 
less effective since the stochastic forces become deeply 
hidden in the sensor data. The practical upshot of the gray 
box approach is to use the theoretical model as a filter, which 
damps the deterministic components and amplifies the sto- 
chastic components, simplifying the task of fault detection 
for the other components. 

Because this approach can operate upon high- and low- 
fidelity models, it is highly effective as a means of prepro- 
cessing sensor data. Such models are available for the 
majority of autonomous systems, leftover from the design 
and analysis efforts to build such systems. 

To effectively implement this module, one must have such 
a design model available, or at the very least an approximate 
physical understanding of the system behavior. Such models 
must provide “reasonable” agreement with sensor data, and 
must therefore output directly comparable quantities at 
similar data rates. This step requires some human 
intervention, first to cast (or build) the model in a format to 
produce such results, and second to verify the model’s 
fidelity against a known standard — be that a superior model, 
an engineering model, or prototype or actual flight data. 
While the model need not be of precise fidelity to be useful, 
it is important to confirm the stability of the model and its 
comparability to sensor information. Use of a poor model 
will increase the reliance upon signal-based methods 
downstream, or in extreme cases destabilize the method 
entirely, in which case only signal-based methods may be 
applied to the sensor data. 

Symbolic Data Model: The symbolic data model 204 
interprets status variables and commands to provide an 
accurate, evolving picture of system mode and requested 
actions. 

In the overall BEAM strategy, real-time measurements 
are combined with predicted and expected behavior along 
with predicted performance to quickly isolate candidate 
faults. The Symbolic Data Model (SDM) is the first line of 
defense in determining the overall health of the system and 
it is the primary component that determines its active and 
predicted states. It operates by examining the values from 
the system status variables and system commands to provide 
an accurate, evolving picture of system mode and requested 
actions. The overall approach afforded by BEAM extends 
considerably beyond more conventional symbolic reason- 
ing. Since most rule-based diagnostic systems (expert 
systems) provide only this module and nothing else, they are 
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limited in that they can only identify and diagnose antici- 
pated problems. 

Knowledge in the SDM is represented as rules, which are 
themselves composed of patterns. The rule is the first 
5 Aristotelian syllogism in the form: If . . . Then .... The 
variables of the syllogism are joined by the And/Or logical 
connections. The selector Else points to other cases. This 
formula is a rule; the rules are sequenced in the succession 
of logical thinking or pointed at a jump in the sequence 
10 (Else Go To). 

Patterns are relations that may or may not have temporal 
constraints, i.e., may only hold true at certain times or persist 
for the entire duration of the analysis. Patterns define the 
constraints that must hold true in order for the antecedent to 
15 succeed. 

Conceptual representation is the main way to formulate 
the patterns of the system as part of a computer program. 
The essential tool the SDM uses is a rule; that is the reason 
why expert systems can also be called rule-based systems. 
20 The SDM operates by using many small slivers of knowl- 
edge organized into conditional If-Then rules. These rules 
are then operated on in a variety of different ways to perform 
different reasoning functions. 

Unlike the numeric models, the SDM requires a knowl- 
25 edge base in order to perform its analysis functions. From 
several points of view, representation of knowledge is the 
key problem of expert systems and of artificial intelligence 
in general. It is not by chance that the term “knowledge- 
based systems” has been applied to these products. 

30 The generic features of knowledge are embodied in this 
representation. In an embodiment of the invention, the 
domain expert stores the objects, actions, concepts, situa- 
tions and their relations using the SHINE (Spacecraft High- 
speed Inference Engine) representation language and this is 
35 stored in the SDM knowledge base. The collection of this 
knowledge represents the sum total of what the SDM will be 
able to understand. The SDM can only be as good as the 
domain expert that taught it. 

The SDM generates two primary kinds of results: derived 
40 states and discrepancies. To provide a uniform 
representation, we use the identical approach in performing 
each of these functions and they differ only in their knowl- 
edge bases that they use. Derived states are sent on to 
signal-processing components as well as other discrete com- 
45 ponents. Discrepancies, as the name implies, are concrete 
indications of faults. 

Coherence-Based Fault Detector: The coherence -based 
fault detector 206 tracks co-behavior of time-varying quan- 
tities to expose changes in internal operating physics. 
50 Anomalies are detected from the time-correlated multi- 
signal sensor data. The method is applicable to a broad class 
of problems and is designed to respond to any departure 
from normal operation, including faults or events that lie 
outside the training envelope. 

55 Also referred to as the SIE (System Invariance Estimator), 
it receives multiple time-correlated signals as input, as well 
as a fixed invariant library constructed during the training 
process (which is itself data-driven using the same time- 
correlated signals). It returns the following quantities: 

60 1. Mode-specific coherence matrix 

2. Event detection 

3. Comparative anomaly detection 

4. Anomaly isolation to specific signals 

65 5. Distance measure of off-nominal behavior 

As a first step of analysis, this computation makes a 
decision whether or not a fault is present, and reduces the 
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search space of data to one or a few signals. Time markers 
are included to indicate the onset of faulted data. These 
conclusions, which can be drawn for nearly any system, are 
then passed to other analysis components for further feature 
extraction, correlation to discrete data events, and interpre- 
tation. 

To motivate a cross-signal approach, consider that any 
continuously valued signal, provided it is deterministic, can 
be expressed as a time-varying function of itself, other 
signals, the environment, and noise. The process of identi- 
fying faults in a particular signal is identical to that of 
analyzing this function. Where the relationship is constant, 
i.e. follows previous assumptions, we can conclude that no 
physical change has taken place and the signal is nominal. 
However, the function is likely to be extremely complex and 
nonlinear. Environmental variables may be unmeasurable or 
unidentified. Lastly, the interaction between signals may be 
largely unknown. For this reason it is more efficient to study 
invariant features of the signals rather than the entire prob- 
lem. 

Because we do have the different signal measurements 
available, we can consider relationships between signals 
separately and effectively decouple the problem. A good 
candidate feature is signal cross-correlation. By studying 
this or a similar feature rather than the raw signals, we have 
reduced our dependence on external factors and have sim- 
plified the scope of the problem. 

In the case of the SIE we will use a slightly different 
feature across pairs of signals, which we refer to as the 
coherence coefficient. It is chosen instead of the ordinary 
coefficient of linear correlation in order to take advantage of 
certain “nice” mathematical properties. This coefficient, 
when calculated for all possible pairs of N signals, describes 
an NxN matrix of values. The matrix is referred to as the 
Coherence Matrix of the system. 

The coherence matrix, when computed from live stream- 
ing data, is an evolving object in time with repeatable 
convergence rates. Study of these rates allows us to segment 
the incoming data according to mode switches, and to match 
the matrix against pre-computed nominal data. 

For the purpose of this discussion, a “Mode” refers to a 
specific use or operation of the system in which the coher- 
ence coefficients are steady. In other words, the underlying 
physical relationships between parameters may change, but 
should remain constant within a single mode. These modes 
are determined from training data for the purpose of detector 
optimization. Ordinarily they do correspond to the more 
familiar “modes,” which represent specific commands to or 
configurations of the system, but they need not be identical. 
Frequently such commands will not appreciably alter the 
physics of the system, and no special accounting is needed. 

Comparison of the runtime coherence matrix to a pre- 
computed, static library of coherence plots, taking into 
account the convergence behavior of the computation, is an 
effective means of anomaly detection and isolation to one or 
more signals. 

Unfortunately, this comparison is only meaningful if we 
can guarantee our present coherence values do not reflect 
mixed-mode data, and so some method of segmentation 
must be found. For purposes of anomaly detection, mode 
boundaries can be detected by monitoring the self- 
consistency of the coherence coefficients. As each new 
sample of data is included into the computation, a matrix 
average for the resulting change is extracted and compared 
against the expected convergence rate. A change in the 
convergence rate implies a new mode has been entered and 
the computation must be restarted. 
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Between detected mode transitions, the difference 
between the computed and expected coherence allows us to 
optimally distinguish between nominal and anomalous con- 
ditions. Violation of this convergence relationship indicates 
5 a shift in the underlying properties of the data, which 
signifies the presence of an anomaly in the general sense. 
The convergence rate of this relationship, used for fault 
detection, is considerably slower than that for data 
segmentation, though still fast enough to be practical. 

10 Once a fault has been indicated, the next step is to isolate 
the signals contributing to that fault. This is done using the 
difference matrix, which is formed from the residuals fol- 
lowing coherence comparison against the library. 

Because nearly every autonomous system relies upon 
15 performance data for operation as well as fault protection, 
this method is applicable to a wide variety of situations. The 
detector increases in accuracy as the number of sensors 
increases; however, computational cost and mode complex- 
ity eventually place a practical limit on the size of the system 
20 to be treated. At the extremes, this method has been suc- 
cessfully applied to systems as small as four sensors and as 
complex as 1,600 of radically varying type. 

The analysis involves computation of an NxN matrix, and 
therefore the computation scales quadratically with the 
25 number of signals; fortunately, most designs are implicitly 
hierarchical, allowing the problem to be separated if com- 
putational costs are too great. Furthermore, typical systems 
or subsystems exhibit a reasonable number of sensors for 
this method given the state-of-the-art in flight processors. A 
30 real-world example of this technique using a single embed- 
ded flight processor (PowerPC clocked at 200 MHz) dem- 
onstrated real-time capability on a 240-sensor system 
sampled at 200 Hz. 

Another key virtue of this approach is its resilience in the 
35 face of novelty. The coherence between signals is a very 
repeatable property in general, especially as compared to 
environmental variable or nonlinear terms in the signals 
themselves. This repeatability allows us to quickly deter- 
mine whether or not the coherence is consistent with any of 
40 the training data, and therefore can be used as an efficient 
novelty detector, regardless of its cause. 

Training of this component is entirely data-driven. In 
order to be most effective, the component should be trained 
with examples of nominal operating data covering every 
45 major mode of operation. These examples should be suffi- 
ciently detailed to capture the expected system dynamics. 
While the actual training of the detector is completely 
autonomous once the task of separating nominal and anoma- 
lous data has been performed, this task — and the collection 
50 of the data — can be difficult. 

In cases where the training data is difficult to obtain or 
identify, the component functions best in a “learning” mode, 
similar to its performance following anomaly detection. If 
we expect to see novel data that does not indicate faults, we 
55 must provide for a feedback to the detector, which is a 
human-intensive process. Novel data can be tagged by the 
detector and archived for study. Following classification as 
nominal or anomalous, the detector can “retrain” using this 
data and continue. This technique has been used effectively 
60 in the study of untested systems. 

In cases where sensor data is relatively isolated, or when 
sample rates preclude the resolution of system dynamics, 
this method is not likely to be effective. These cases place a 
much greater reliance upon the symbolic method compo- 
65 nents of BEAM. 

Dynamical Invariant Anomaly Detector: The dynamical 
invariant anomaly detector 208 tracks parameters of indi- 
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vidual signals to sense deviations. The dynamical invariant 
anomaly detector is designed to identify and isolate anoma- 
lies in the behavior of individual sensor data. 

Traditional methods detect abnormal behavior by analyz- 
ing the difference between the sensor data and the predicted 
value. If the values of the sensor data are deemed either too 
high or low, the behavior is abnormal. In our proposed 
method, we introduce the concept of dynamical invariants 
for detecting structural abnormalities. 

Dynamical invariants are governing parameters of the 
dynamics of the system, such as the coefficients of the 
differential (or time -delay) equation in the case of time- 
series data. Instead of detecting deviations in the sensor data 
values, which can change simply due to different initial 
conditions or external forces (i.e. operational anomalies), we 
attempt to identify structural changes or behavioral changes 
in the system dynamics. While an operational abnormality 
will not lead to a change in the dynamical invariants, a true 
structural abnormality will lead to a change in the dynamical 
invariants. In other words, the detector will be sensitive to 
problems internal to the system, but not external distur- 
bances. 

We start with a description of a traditional treatment of 
sensor data given in the form of a time series describing the 
evolution of an underlying dynamical system. It will be 
assumed that this time series cannot be approximated by a 
simple analytical expression and does not possess any 
periodicity. In simple words, for an observer, the future 
values of the time series are not fully correlated with the past 
ones, and therefore, they are apprehended as random. Such 
time series can be considered as a realization of an under- 
lying stochastic process, which can be described only in 
terms of probability distributions. However, any information 
about this distribution cannot be obtained from a simple 
realization of a stochastic process unless this process is 
stationary — in this case, the ensemble average can be 
replaced by the time average. An assumption about the 
stationarity of the underlying stochastic process would 
exclude from consideration such important components of 
the dynamical process as linear and polynomial trends, or 
harmonic oscillations. Thus we develop methods to deal 
with non-stationary processes. 

Our approach to building a dynamical model is based 
upon progress in three independent fields: nonlinear 
dynamics, theory of stochastic processes, and artificial neu- 
ral networks. 

After the sensor data are stationarized, they are fed into a 
memory buffer, which keeps a time history of the sensor data 
for analysis. We will study critical signals, as determined by 
the symbolic components of BEAM, the operating mode, 
and the cross-signal methods outlined above. The relevant 
sensor data is passed to a Yule-Walker parameter estimator. 
There, the dynamical invariants and the coefficients of the 
time-delay equation are computed using the Yule-Walker 
method. 

Once the coefficients are computed, they will be com- 
pared to the ones stored in a model parameter database. This 
contains a set of nominal time-delay equation coefficients 
appropriate for particular operating mode. A statistical com- 
parison will be made between the stored and just-computed 
coefficients using a bootstrapping method, and if a discrep- 
ancy is detected, the identity of the offending sensor will be 
sent on. 

Further analysis is carried out on the residual or the 
difference between the sensor data values and the model 
predicted values, i.e. the uncorrelated noise, using a nonlin- 
ear neural classifier and noise analysis techniques. The 
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nonlinear neural classifier is designed to extract the nonlin- 
ear components, which may be missed by the linear Yule- 
Walker parameter estimator. The weights of the artificial 
neural network, another set of dynamical invariants, will be 
5 computed and compared with nominal weights stored in the 
model parameter database. Similarly, the noise 
characteristics, such as the moments of probability 
distribution, are dynamic invariants for stationarized sensor 
data, and will be compared with those stored in the Model 
10 Parameter Database. If any anomalies are detected in either 
the nonlinear components or the noise, the identity of the 
sensor will be sent to the channel anomaly detector. 

Finally, the channel anomaly detector aggregates infor- 
mation from the Yule-Walker parameter estimator, nonlinear 
15 neural classifier, and noise analysis modules, and classifies 
the anomaly before sending the fault information to the 
Predictive Comparison module, which is discussed below. 

Like the SIE described above, training of this detector is 
data-driven. It has similar requirements in terms of data set 
20 and human involvement. Also like the SIE, insufficient data 
training will result in false alarms, indicating novelty, until 
data collection and review during flight operations produce 
a sufficiently large data set to cover all nominal operating 
modes. 

25 Also like the SIE, this method is only likely to be effective 
if system dynamics are captured in the sensor data. 
However, this method is effective on isolated sensors, 
though it is often not sensitive to interference or feedback 
faults that manifest on no particular sensor. 

30 Informed Maintenance Grid (IMG): (Optional) The 
informed maintenance grid 210 studies evolution of cross- 
channel behavior over the medium- and long-term operation 
of the system. Tracking of consistent deviations exposes 
degradations and lack of performance. This component is 
35 optional and is not a necessary component to the operation 
of the invention. 

The IMG itself is a three-dimensional object in informa- 
tion space, intended to represent the evolution of the system 
through repeated use. The IMG is constructed from results 
40 from the SIE described above, specifically the deviations in 
cross-signal moments from expected values, weighted 
according to use and averaged over long periods of opera- 
tion. The cross-signal moments are a persistent quantity, 
which amplifies their value in long-term degradation esti- 
45 mation. 

There are two convincing reasons to consider cross-signal 
residuals in this fashion. First is the specific question of 
degradation and fault detection. Degradation typically mani- 
fests as a subtle change in operating behavior, in itself not 
50 dramatic enough to be ruled as a fault. This emergent 
behavior frequently appears as subthreshold residuals in 
extracted signal features. As described above, statistical 
detection methods are limited in sensitivity by the amount of 
data (both incident and training data) that can be continu- 
55 ously processed. However, tracking of consistent residuals 
over several such experiments can lead to a strong implica- 
tion of degraded performance. 

The second reason addresses functional health of the 
system. In addition to tracking the magnitude and consis- 
60 tency of residuals arising from degradation, we can also 
observe their spread to other signals within the system and 
track their behavior relative to different modes of usage. 

The IMG produces a long-term warning regarding 
system-wide degradations. This differs slightly from the 
65 companion Prognostic Assessment module, which concerns 
itself with individual signals and preestablished operating 
limits. However, the IMG is also useful with regard to novel 
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degradations. Such are the norm with new advanced 
systems, as predicting degradation behavior is very difficult 
and much prognostic training must be done “on the job.” 

Visually, the three-dimensional object produced by the 
IMG is an easily accessible means of summarizing total 
system behavior over long periods of use. This visual means 
of verifying IMG predictions makes BEAM easily adapted 
for applications with human operators present. 

Prognostic Assessment: The prognostic assessment com- 
ponent 212 makes a forward-projection of individual signals 
based on their model parameters. Based upon this, it estab- 
lishes a useful short-term assessment of impending faults. 

The channel level prognostics algorithm is intended to 
identify trends in sensor data which may exceed limit values, 
such as redlines. This by itself is a common approach. 
However, given the richness of feature classification avail- 
able from other BEAM components, it is highly effective to 
update this procedure. A stochastic model similar in prin- 
ciple to the auto -regressive model is used to predict values 
based upon previous values, viz. forecasting. The aim is to 
predict, with some nominal confidence, if and when the 
sensor values will exceed its critical limit value. This 
permits warning of an impending problem prior to failure. 

In general, time -series forecasting is not a deterministic 
procedure. It would be deterministic only if a given time 
series is described by an analytical function, in which case 
the infinite lead time prediction is deterministic and unique 
based upon values of the function and all its time derivatives 
at t=0. In most sensor data, this situation is unrealistic due 
to incomplete model description, sensor noise, etc. In fact, 
present values of a time series may be uncorrelated with 
previous values, and an element of randomness is introduced 
into the forecast. 

Such randomness is incorporated into the underlying 
dynamical model by considering the time series for t^O as 
a realization of some (unknown) stochastic process. The 
future values for t>0 can then be presented as an ensemble 
of possible time series, each with a certain probability. After 
averaging the time series over the ensemble, one can rep- 
resent the forecast as the mean value of the predicted data 
and the probability density distributions. 

The methodology of time series forecasting is closely 
related to model fitting and identification. In general, the 
non-stationary nature of many sensor data may lead to 
misleading results for future data prediction if a simple 
least-square approach to polynomial trend and dominating 
harmonics is adopted. The correct approach is to apply 
inverse operators (specifically difference and seasonal dif- 
ference operators) to the stationary component of the time 
series and forecast using past values of the time series. 

To implement this module, we begin by feeding a Pre- 
dictor stationary data, the auto regressive model coefficients, 
past raw data values, and limit values; i.e., everything 
required to evaluate the prediction plus a redline value at 
which to stop the computation. The predictor will generate 
many predictions of time to redline and pass them on to the 
Redline Confidence Estimator. The Redline Confidence 
Estimator will then construct a probability distribution of the 
time when the channel value will exceed the redline limit. 
Finally, the Failure Likelihood Estimator takes the probabil- 
ity distribution and computes the likelihood (probability) 
that the channel value may exceed the redline value within 
some critical time. If the probability exceeds a certain preset 
threshold as determined by the application, e.g. 99% 
confidence, then the critical time and its probability will be 
sent to the symbolic components. 

Predictive Comparator: The predictive comparator com- 
ponent 214 compares the requested and commanded opera- 


14 

tion of the system versus the sensed operation as interpreted 
from the time-varying quantities. Its goal is to detect mis- 
alignment between system software execution and system 
hardware operation. This is a principal concern, as we are 
5 dealing with systems that rely on a large degree of software 
control, if not complete autonomy. 

Causal System Model: The causal system model 216 is a 
rule-based connectivity matrix designed to improve source 
fault isolation and actor signal identification. In the SDM, 
10 the entire domain knowledge is represented as If-Then rules 
only. When the domain is very large and complex, an 
entirely rule-based representation and associated inference 
leads to a large and inefficient knowledge base, causing a 
very poor focus of attention. To eliminate such unwieldy 
15 knowledge bases in the SDM engine, we provide a causal 
system model. This component simplifies the problem by 
looking for relationships between observations in the data to 
fill in blanks or gaps in the information from the SDM. 

Interpretation Layer: The interpretation layer 218 collates 
20 observations from separate components and submits a single 
fault report in a format usable to recovery and planning 
components or to system operators. It submits a single fault 
report in a format usable to recovery and planning compo- 
nents (in the case of a fully autonomous system) or to system 
25 operators. This is a knowledge -based component that is 
totally dependent upon the domain and the desired format of 
the output. 

In the following sections, we will examine the individual 
components in additional detail. 

30 

4.0 Component Descriptions 
4.1 Coherence-based Fault Detection (SIE) 

A coherence-based fault detector 206 according to the 
35 invention is a method of anomaly detection from time- 
correlated sensor data. In and of itself, the coherence -based 
fault detector is capable of fault detection and partial clas- 
sification. The method is applicable to a broad class of 
problems and is designed to respond to any departure from 
40 normal operation of a system, including faults or events that 
lie outside the training envelope. Further examples and 
clarifying details of this aspect of the invention can be found 
in Appendix C attached herewith. 

In an embodiment of the invention, the coherence -based 
45 fault detector 206 is based on a System Invariance Estimator 
(SIE) which is a statistical process for examining multi- 
signal data that lies at the heart of this aspect of the 
invention. As input, the coherence -based fault detector 
receives multiple time -correlated signals. The detector com- 
50 pares their cross-signal behavior against a fixed invariant 
library constructed during a training process (which is itself 
data-driven using the same time -correlated signals). It 
returns the following quantities: 

Mode -specific coherence matrix 
55 / 

Event detection 

Comparative anomaly detection 

Anomaly isolation to specific signals 

Distance measure of off-nominal behavior 
60 As a first step of analysis, this computation makes a 
decision whether or not a fault is present, and reduces the 
search space of data to one or a few signals. Time markers 
are included to indicate the onset of faulted data. These 
conclusions, which can be drawn for nearly any system, are 
65 then passed to other analysis components for further feature 
extraction, correlation to discrete data events, and interpre- 
tation. 
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In this section we will motivate the cross-signal approach 
to fault detection. Because quantitative information is so 
readily available, approaches grounded in signal processing 
are likely to be effective. The method described here has two 
distinct advantages. The first is its broad range of 
applicability — the module described here has been used to 
successfully fuse sensor and computed data of radically 
different types, on numerous systems, without detailed sys- 
tem knowledge and with minimal training. The second is its 
ability to detect, and with few exceptions correctly resolve, 
faults for which the detector has not been trained. This 
flexibility is of prime importance in systems with low 
temporal margins and those with complex environmental 
interaction. 

Consider a continuously valued signal obtained for 
example by a sensor measuring some aspect of a system, 
sampled uniformly. Provided this signal is deterministic, it 
can be expressed as a time-varying function: 


16 

We must retain a consideration to operating mode hinted 
at above. For the purpose of this discussion, a mode implies 
a particular set of relational equations that govern each 
signal. In other words, the operating physics of the system 
can differ between modes but is assumed to be constant 
within a mode. These modes are ordinarily a direct match to 
the observable state of the system; i.e., inactive, startup, 
steady-state, etc. Mode differs from the external environ- 
ment in that it is a measure of state rather than an input to 
the system’s behavior. 

Provided we can correctly account for operating mode, we 
then have a much simplified set of relations to study, namely 
those between pairs of signals, or in the more general sense 
each signal versus the larger system. Faults in the system can 
be expected to manifest themselves as departures from the 
expected relationships. For this reason, the study of corre- 
lations between the signals is singularly useful as a generic 
strategy. 

4.1.2 Coherence Estimation 


( 4 . 1 . 1 ) 

In the above expression, we have identified the signal as 
a function of itself and other signals, as expressed by {S^t)}, 
and of the environment, which may contain any number of 
relevant parameters {E(t)}. There is also a noise term e(t) 
included to reflect uncertainties, in particular actual sensor 
noise that accompanies most signals in practice. 

The process of identifying faults in a particular signal is 
identical to that of analyzing the function ,f(t). Where this 
relation remains the same, i.e. follows the original 
assumptions, we can conclude that no physical change has 
occurred for that signal, and therefore the signal is nominal. 
Such is the approach taken by model-based reasoning 
schemes. 

However, the function f for each signal is likely to be 
extremely complex and nonlinear. The environmental vari- 
ables may be unknown and unmeasurable. Lastly, the spe- 
cific interaction between signals may also be unknown, for 
instance in the case of thermal connectivity within a system. 
For this reason, it is more efficient and more generally 
applicable to study invariant features of the signals rather 
than the full-blown problem. 

One excellent candidate feature for study is cross- 
correlation between signals. By studying this computed 
measurement rather than signals individually, we are reduc- 
ing the dependence on external factors (i.e. environmental 
variables) and thus simplifying the scope of the problem. 

Cross-correlative relationships between signals, where 
they exist, remain constant in many cases for a given mode 
of system operation. The impact of the operating 
environment, since we are dealing with time-correlated 
signals, applies to all signals and thus can be minimized. 
This approach is essentially the same as decoupling the 
expression above, and choosing to study only the simpler 
signal- to -signal relationships, as follows: 

(0 ( 4 . 1 . 2 ) 

For a realistic system, this hypothesis is easy to support. 
In most cases, relationships between signals that represent 
measured quantities are readily apparent. The environmental 
contribution can be considered an external input to the 
system as a whole rather than being particular to each signal. 
The sensor itself is the source of most of the noise, and it too 
can be separated. Even where such separability is not 
explicitly valid, it is likely to be a good approximation. 


Two common measures of second-order cross-signal sta- 
tistics are the Covariance and the Coefficient of Linear 
Correlation. Covariance is a good measure of similar behav- 
ior between arbitrary signals, but it suffers from a number of 
25 difficulties. One such problem is that a covariance matrix 
will be dominated by the most active signals, viz. those with 
the greatest variance. In order to avoid this, covariance is 
typically normalized by the relative variances of the signals, 
as in the Correlation Coefficient. However, such a normal- 
30 ization is often overly simplistic and leads to the inverse 
problem. A correlation matrix tends to become ill- 
conditioned in the presence of signals with relatively low 
variances. 

Returning to the original goal, we are interested in com- 
35 paring signals. This should take into account both the 
covariance and the relative variances of the signals. In 
accordance with the invention, we define a coherence coef- 
ficient expressed below as: 

40 ... = ICovffi, (4.0) 

^ Max(V ar (S ; ), Var (Sj)) ' 


We have used the standard definitions: 

45 

1 r _ ( 4 . 1 . 4 ) 

Co \(S h Sj) = - I (Si - Si)(Sj - Sj)dt 

Var(.S '■) = - f(S i —S;) 2 dt (4L5) 

50 U 

The choice of the maximum variance in the denominator 
guarantees a coherence coefficient value normalized to [-1, 
1]. Furthermore, the absolute value is taken because the sign 
55 of the relation is of no importance for arbitrary signals, only 
the existence or nonexistence of a causal connection. A 
coherence value close to 0 implies no relationship between 
signals, whereas a value approaching 1 indicates a very 
strong relationship. Consequently, the coherence coefficient 
60 is normalized to [0,1]. 

Given N data streams, this calculation defines an NxN 
coherence coefficient matrix where each entry represents a 
degree of causal connectivity. Where relationships between 
signals are fixed, i.e. during a single mode of system 
65 operation, the coherence coefficient between those two sig- 
nals will remain constant within the bounds of statistical 
uncertainty. Provided the coherence coefficient converges, 
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this relationship is repeatable, and so it cab be used as a basis 
for comparison between training and run-time data. 

Admittedly the above assertions are too strict for a 
real-world example. Systems may indeed contain signals 
with fluctuating or drifting relationships during nominal 
operation. Additionally, the requirement to maintain a count- 
able number of modes may force us to simplify the system 
state model, to the detriment of repeatability. We will 
examine modifications to the above to mitigate these con- 
cerns. 

4.1.3 Event Detection 

Having understood that cross-channel measurements are 
an effective method of signal analysis, we next explore how 
to best apply the calculation above. A first question to ask is 
how the data should be gathered. From the discussion above, 
it is noted that we should avoid applying this operator to 
mixed-mode data. Such data represents a combination of 
two separate sets of underlying equations for the signals, 
thus mixed-mode correlations are not necessarily repeatable. 

A mode (“mode of operation”, “system mode”, etc.) 
signifies a different type of system operation. Let’s say we 
were going to build a numerical simulation of the system. 
For each mode, we need a separate model, or at least model 
components special to that mode. Different system operation 
can be caused by a configurational change or significant 
environmental change. A system may enter a new mode of 
operation by command or as a consequence of a faults. 

Consider an automobile engine, for example. While “Off” 
it would be in one (very boring) mode. “Start” would be 
another, which is different from “Run” because during 
“Start” the electric starter is active. Once in “Run,” changing 
fuel rates to accelerate would not be considered a new 
mode — basic operation is not changed, a model of operation 
for the engine would not require different equations. The 
transition from “Off” to “Start” to “Run” is commanded by 
the driver. Using this same example, fault modes might also 
be configurational or environmental. A fuel pump degrada- 
tion might put the engine into the “Lean” faulty mode. Poor 
quality gasoline (using octane rating as an environmental 
variable) might result in “pinging.” Faults can also be 
commanded, for instance turning on the starter while the 
engine is already running, resulting in a superposition of 
“Run” and “Start” modes that is different from both and 
outside the design specifications of the engine. 

Typical applications avoid the issue of mode switching 
through one of the following methods: 

a) Compute only correlations having a fixed, mode- 
independent relationship. 

This method is effective in reliable fault detectors, 
however the system coverage is typically very lim- 
ited. The method is restricted to well-understood 
signal interactions and is not generalizable. 
(However, see Section 4.1.5, where we attempt to 
redress this philosophy.) 

b) Window the data such that near-steady-state operation 
can be assumed. 

This procedure also carries significant inherent limita- 
tions. Because the computation, whichever is used, is 
statistical in nature, selection of a fixed window size 
places a hard limit on latency as well as confidence 
of detection. This also does not directly address the 
core problem. 

c) Window the computation according to external state 
information, such as commands. 

This is the best approach, and it is used in the full 
formulation of BEAM. However, it too has limits. 
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External state information may not be available. 
Additionally, there may not be a perfect alignment 
between discrete “operating modes” and observable 
shifts in the system — it may not be one-to-one. 

5 Our solution to the mixed-mode problem in accordance 
with the invention is based upon the following observations. 
Consider a pair of signals with a fixed underlying linear 
relationship, subject to Gaussian (or any other zero-mean) 
random noise. The coherence calculation outlined in Section 
10 4.1.2 above will converge to a fixed value, according to the 
following relationship: 




The exact rate of convergence depends on the relative 
contribution from signal linear and noise components as 
well as the specific character of signal noise. However, in 
practice, it is much easier to determine the relationship 
empirically from sampled data. 

Given the convergence relationship above, we can define 
a data test in order to assure single-mode computation. By 
adopting this approach, we can successfully separate steady- 
state operation from transitions. This means: 

a) Transition detection is available for comparison to 
expected system behavior. 

A “transition” in this case is a switch from one mode to 
another. Most of these are predictable and nominal. 
On the other hand, a broad class of system faults can 
be considered transitions, particularly those involv- 
ing sudden electrical failure or miscommand sce- 
narios. Unexpected events in the system immediately 
merit further analysis. 

b) Calculated coherence uses the maximum amount of 
data available to make its decisions, which optimizes 
sensitivity and confidence. 

Use of the convergence rate establishes a time- varying 
estimate of confidence in the calculation, which is 
transparent to the final output of the detector. The 
time -variance also applies to the values of the com- 
puted coherence, which we will study in further 
detail in the following section. 

The quantity 

is computed and referred to as the coherence stability. This 
single parameter is a good indicator of steady -state behavior. 

One observation regarding stability is that the conver- 
gence rate is quite fast. This character allows us to make 
confident decisions regarding mode transitions with rela- 
tively little data to study. This also lends credibility to more 
complex and subtle fault detection using a coherence -based 
strategy. 


4.1.4 Comparative Fault Detection 


In the previous section, we identified a method to qualify 
data entering the detector in terms of mode consistency 
60 based upon convergence properties of the calculation. Next 
we will use a similar strategy to differentiate between 
nominal and faulty data, where the fault manifests itself as 
a drift rather than a transition. Such a fault case is more 
physically interesting than a sudden transition, since we are 
65 concerned about a lasting effect upon the system rather than 
an instantaneous data error. Suppose we have a current £~(t) 
estimate that we are comparing to a different estimate, call 
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it £ 0 . As we accumulate more data, the estimate is expected 
to converge at the following rate: 

IT (4.1.7) 

\ m -&\ 5 


This relationship determines the accuracy of the calcula- 
tion’s raw value, which is representative of the equation 
between the two signals. It is conceptually similar to the io 
error in estimated mean for a statistical sampling process. 

We can use this relationship to detect a shift in the equations, 
much in the manner that events are detected above. 


The computed quantity |£ I y(t)-£ 0 | is calculated and 
referred to as the coherence deviation. When compared with 
the base convergence rate, it is a measurement of confidence 
that the coherence relationship is repeating its previous 
(nominal) profile. Between detected mode transitions, as 
discussed in the previous sections, use of this relationship 
allows us to optimally distinguish between nominal and 
anomalous conditions. Violation of this convergence rela- 
tionship indicates a shift in the underlying properties of the 
data, which signifies the presence of an anomaly in the 
general sense. 

Note that the convergence rate of this relationship is 
considerably slower, though still fast enough to be practical. 
Because of this, it is particularly valuable to adapt a 
variable -windowing scheme where data is automatically 
segmented at mode boundaries. 

4.1.5 Optimization 
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The sections above define a method of generic cross- 
signal computation and identify properties that facilitate 
decisions about the data. In this section we will examine 
how to best apply these properties. 35 

The convergence properties above are written for each 
individual signal pair. In order to apply this approach in 
general to a system with N signals, we have 0(N 2 ) signal 
pairs to process. At first glance, the approach does not appear 
to lend itself to scaling. For this reason, most cross-signal 40 
approaches focus on preselected elements of the matrix, 
which cannot be done without considerable system knowl- 
edge or examples of anomalous data from which to train. 

However, there are some advantages to studying the full 
matrix. For the general case, we may not know a priori 45 
which signal pairs are significant. Additionally, there are 
likely to be numerous interactions for each signal, which 
may vary depending on the mode of operation. Only in rare 
cases are the individual elements of the matrix of particular 
interest. In the general case, we are concerned with signal 50 
behavior versus the entire system, which corresponds to an 
entire row on the coherence matrix. 

Because we are more concerned with the overall system 
performance, we should instead consider a single global 
measure based on the entire matrix. This requires some sort 55 
of matrix norm. 


Many matrix norms exist. In this particular embodiment 
of the invention, we shall use the following, where M is an 
arbitrary N-by-N matrix: 

60 


ii"ii = p 



M; 


(4.1.8) 


placement. An absolute value is used because we are prin- 
cipally concerned with the magnitude of differences between 
one matrix and another, rather than their sign. (An exception 
to this: Following the detection of an anomaly, for purposes 
of identification the sign can be important, as faults that 
cause an increase in coherence are typically more physically 
complex and more interesting.) The choice to average row 
totals rather than each individual element is motivated by the 
inherent structure of the coherence matrix, specifically the 
fact that each row represents a single signal’s total contri- 
bution. By averaging the rows prior to their summation we 
hope to counteract noise present in the calculation, whereas 
differences due to a significant shift are likely to be of the 
same sign. 

We can substitute the norm into the convergence relation- 
ships Eq. (4.1.6) and Eq. (4.1.7) above without changing 
their character: 


||^(f)-^(r-l)||~-,and 


(4.1.9) 





(4.1.10) 


where M in Eq. (4.1.8) is replaced with (^-(t)-^-(t-l)) and 
(5y(t)-^ 0 ) respectively, and N=ixj. 

The stability and deviation on the left are now indicative 
of the entire matrix. This produces a tradeoff between 
individual pair sensitivity and false-alarm reduction, while 
at the same time greatly reducing computational cost. 

Another adaptation to this approach is to consider sepa- 
rate weighting of different pairs. It is clear that some signal 
pair relationships will be well defined while others will be 
pseudorandom. Additionally, we have adopted the concept 
of multiple modes to handle different relationships at dif- 
ferent phases of system operation. This can become an 
unbounded problem, and a mechanism is needed to guaran- 
tee a small number of modes. 

Let us introduce a weighting matrix W £ - into the conver- 
gence relationships above: 


WWuZuW-Wueuit-m- 


l 

J 


(4.1.11) 


WWijZjW-WijZoW- 



(4.1.12) 


The matrix is a companion to the training matrix 
and is computed as part of the training cycle. For a general 
application, i.e. an application for which no signal relation- 
ships are known or suspected, it is computed by normalizing 
each signal pair coherence by the observed variance in that 
coherence. This normalization matrix, along with the model 
coherence £ 0 , can be later combined with other coherence/ 
normalization pairs in order to combine modes or enhance 
training data results with new data. 

4.1.6 Post-processing 

If an unknown fault has been detected, the next step is to 
isolate the responsible signals. This is done by studying the 
difference matrix: 


65 

The norm chosen here differs from the simple matrix 
average in one detail, namely the absolute value and its 


D-W^tyio) (4.1.13) 

Given an anomaly on one signal, we expect to see the 
correlation between this signal and all others diminish 
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compared to the expected values. There may be stronger 
shifts between some signals and others, but in general the 
coherence values will decrease. Visually this leads to a 
characteristic “cross-hair” appearance on the rendered dif- 
ference matrix. 5 

The total deviation for each signal is computed by sum- 
ming the coherence difference (absolute values) over each 
row of the matrix. The ranking module 306 provides a 
ranking of these deviations to determine the most likely 
contributors to the faults. This channel implication is passed io 
to interpretive elements of the invention and to single-signal 
analysis modules. 

In general an anomaly will manifest as a decrease in 
coherence between signals. However, there are rare cases 
where coherency will increase. Typically this is not system- 15 
wide but is isolated to a few specific pairs. Such an increase 
in coherency is indicative of a new feedback relationship 
occurring in the system, and it must be given special 
attention. 

Such cases, physically, define previously unknown modes 20 
of the system. This mode may be nominal or faulty. In the 
former case, such detection implies that the training data 
used to tune the detector does not adequately cover the 
operations space, and must be expanded. In the latter case, 
knowledge of what specific signals or pairs are anomalous 25 
can directly lead to better understanding of the problem, 
particularly in cases where causal or physical models are 
available to the diagnostic engine. 

The underlying principle is that large deviances (taking 
weighting into account) are probably key contributors. 30 
However, in addition to this is the question of phase. Signal 
pairs that show a positive phase, i.e., increased coherence, 
are more interesting than signal pairs that show a decreased 
coherence. Furthermore, a signal that shows an increase in 
coherence with a particular signal, but with a decrease with 35 
respect to all other signals is remarkable and thus of great 
interest. Such signals are identified by the key signals 
module 310. 

4.1.7 Architecture 

40 

We have discussed an underlying theory for cross-signal 
event and fault detection. An operating architecture to 
implement this approach is given in FIG. 3. 

Each sample of time-correlated, stationarized data is 
passed to the Incremental Coherence Estimator 302, where 45 
Eq. (4.1.3) is updated for each signal pair. The coherence 
stability is computed over the matrix, and is checked against 
relationship Eq. (4.1.11) in the Convergence Rate Test 306. 

If this test fails, this indicates the presence of mixed mode 
data and so the coherence estimate is reset for another pass. 50 
This is repeated until the relationship Eq. (4.1.11) is satis- 
fied. 

After the test above, we are guaranteed a coherence 
estimate free of mixed-mode data. The estimate is compared 
against the expected coherence supplied by the Coherence 55 
Library 312, as selected by the symbolic model 204 and 
command data. The match is checked against relation Eq. 
(4.1.12) by the Coherence comparator 304. 

If we have a mismatch that compares favorably to an 
abnormal library coherence, we have a known fault, which 60 
will be flagged according to the fault number and passed to 
the interpreter. This is the “known bad coherence” path 
shown in FIG. 3. 

If we cannot find a suitable match, as is more frequently 
the case, the differenced coherence Eq. (4.1.13) is examined 65 
to extract the key actor signals and pairs. This processing is 
discussed above in Section 4.1.6. 
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At the end of this operation, we will have successfully 
identified normal versus anomalous operation of the system 
as a whole. For those cases where anomalous conditions are 
detected, we have isolated the effect to a known case or to 
the key measurements that led us to that conclusion. This 
has, in essence, digitized the problem into terms that the 
interpreter can understand, as will be discussed in section 
4.5 below. 

4.2 Single Channel Feature Extraction 

The Dynamical Invariant Anomaly Detector 208 is 
designed to identify and isolate anomalies in the behavior of 
individual sensor data. Traditional methods detect abnormal 
behavior by analyzing the difference between the sensor data 
and the predicted value. If the values of the sensor data are 
deemed either too high or low, the behavior is abnormal. In 
accordance with the present invention, we introduce the 
concept of dynamical invariants for detecting structural 
abnormalities. Dynamical invariants are governing param- 
eters of the dynamics of the system, such as the coefficients 
of the differential (or time-delay) equation in the case of 
time-series data. Instead of detecting deviations in the sensor 
data values, which can change simply due to different initial 
conditions or external forces (i.e. operational anomalies), we 
attempt to identify structural changes or behavioral changes 
in the system dynamics. While an operational abnormality 
will not lead to a change in the dynamical invariants, a true 
structural abnormality will lead to a change in the dynamical 
invariants. In other words, the detector will be sensitive to 
problems internal to the system, but not external distur- 
bances. 

4.2.1 Dynamical Model 

We start with a description of a traditional treatment of 
sensor data given in the form of a time series describing the 
evolution of an underlying dynamical system. It will be 
assumed that this time series cannot be approximated by a 
simple analytical expression and does not possess any 
periodicity. In simple words, for an observer, the future 
values of the time series are not fully correlated with the past 
ones, and therefore, they are apprehended as random. Such 
time series can be considered as a realization of an under- 
lying stochastic process, which can be described only in 
terms of probability distributions. However, any information 
about this distribution cannot be obtained from a simple 
realization of a stochastic process unless this process is 
stationary. (In this case, the ensemble average can be 
replaced by the time average.) An assumption about the 
stationarity of the underlying stochastic process would 
exclude from consideration such important components of 
the dynamical process as linear and polynomial trends, or 
harmonic oscillations. In accordance with the invention, we 
provide methods to deal with non-stationary processes. 

Our approach to building a dynamical model is based 
upon progress in three independent fields: nonlinear 
dynamics, theory of stochastic processes, and artificial neu- 
ral networks. From the field of nonlinear dynamics, based 
upon the Takens theorem, any dynamical system which 
converges to an attractor of lower (than original) dimen- 
sionality can be simulated (with a prescribed accuracy) by a 
time-delay equation: 

■ ■ ■ x(t-2x), . . . pc{t-mx)\ ( 4 . 2 . 1 ) 

where x(t) is a given time series, and x=constant is the time 
delay. The function F represents some deterministic function 
representative of the system. 
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It was proven that the solution to Eq. (4.2.1) subject to 
appropriate initial conditions converges to the original time 
series: 

. . . ,etc., (4.2.2) 

if m in Eq. (4.2.1) is sufficiently large. 

However, the function F, as well as the constants t and m, 
are not specified by this theorem, and the most damaging 
limitation of the model of Eq. (4.2.1) is that the original time 
series must be stationary, since it represents an attractor. This 
means that for non-stationary time series the solution to Eq. 

(4.2.1) may not converge to Eq. (4.2.2) at all. Actually this 
limitation has deeper roots and is linked to the problem of 
stability of the model. 

Prior to the development of Takens theorem, statisticians 
have developed a different approach to the problem in which 
they approximated a stochastic process by a linear autore- 
gressive model: 

x(t)=a 1 x(t-l)+a^c(t-2)+ . . . +a n (t-n)+N,n^> co, (4.2.3) 

where a,- are constants, and N represents the contribution 
from noise. 

A zero-mean purely non-deterministic stationary process 
x(t) possesses a linear representation as in Eq. (4.2.3) with 

j= i 

(the condition of the stationarity). 

On first sight, Eq. (4.2.3) is a particular case of Eq. (4.2.1) 
when F is replaced by a linear function and t= 1. However, 
it actually has an important advantage over Eq. (4.2.1): It 
does not require stationarity of the time series Eq. (4.2.2). To 
be more precise, it requires certain transformations of Eq. 

(4.2.2) before the model can be applied. These transforma- 
tions are supposed to “stationarize” the original time series. 
These types of transformations follow from the fact that the 
conditions of stationarity of the solution to Eq. (4.2.3) 
coincide with the conditions of its stability. In other words, 
the process is non-stationary when 

(4.2.4) 

where G a are the roots of the characteristic equation asso- 
ciated with Eq. (4.2.3). 

The case IG-Ji^l is usually excluded from considerations 
since it corresponds to an exponential instability, which is 
unrealistic in physical systems under observation. However, 
the case lG-^1 is realistic. Real and complex conjugates of 
G 1 incorporate trend and seasonal (periodic) components, 
respectively, into the time series Eq. (4.2.2). 

By applying a difference operator: 

Vx t =x-x t _ 1 =(l-B)x t (4.2.5) 

where B is defined as the backward shift operator, as many 
times as required, one can eliminate the trend from the time 
series: 

x t ^ t -i,x t - 2 , . . . ,etc. (4.2.6) 

Similarly, the seasonal components from time series Eq. 
(4.2.6) can be eliminated by applying the seasonal difference 
operator: 

V^=(l -B^x^x-x^. (4.2.7) 

In most cases, the seasonal differencing Eq. (4.2.7) should 
be applied prior to standard differencing Eq. (4.2.5). 
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Unfortunately, it is not known in advance how many times 
the operators Eq. (4.2.5) or Eq. (4.2.7) should be applied to 
the original time series Eq. (4.2.6) for their stationarization. 
Moreover, in Eq. (4.2.7) the period S of the seasonal 
5 difference operator is also not prescribed. In the next section 
we will discuss possible ways to deal with these problems. 

Assuming that the time series Eq. (4.2.6) is stationarized, 
one can apply to them the model of Eq. (4.2.1): 

10 

y(t)=F\y(t-l),y(t-2), . . . ,y{t-m)\ (4.2.8) 

where 

15 y^t- 1 , ■ ■ ■ ,etc.; (y l =x-x l _ 1 ) (4.2.9) 

are transformed series Eq. (4.2.6), and t= 1. After fitting the 
model of Eq. (4.2.8) to the time series Eq. (4.2.6), one can 
return to the old variable x(t) by exploiting the inverse 
20 operators (1-B) -1 and (1-B 5 ) -1 . For instance, if the station- 
arization procedure is performed by the operator Eq. (4.2.5), 
then: 

x(f)=x(f-l)+i 7 {[x(f-l)-x(f-2)] ;> [x(f-2)-x(f-3), . . . ,etc.}. (4.2.10) 

25 

Eq. (4.2.10) can be utilized for predictions of future 
values of Eq. (4.2.6), as well as for detection of structural 
and operational abnormalities. However, despite the fact that 
Eq. (4.2.8) and Eq. (4.2.10) may be significantly different, 
30 their structure is uniquely defined by the same function F. 
Therefore, structural abnormalities that cause changes of the 
function F can also be detected from Eq. (4.2.8) and con- 
sequently for that particular purpose the transition to Eq. 
35 (4.2.10) is not necessary. 

It should be noted that application of the stationarization 
procedure Eq. (4.2.5) and Eq. (4.2.7) to the time series Eq. 
(4.2.6) is only justified if the underlying model is linear, 
since the stationarity criteria for nonlinear equations are 
40 more complex than for linear equations, in similar fashion to 
the stability criteria. Nevertheless, there are numerical evi- 
dences that even in nonlinear cases, the procedures Eq. 
(4.2.5) and Eq. (4.2.7) are useful in that they significantly 
reduce the error, i.e., the difference between the simulated 
45 and the recorded data, if the latter are non-stationary. 

4.2.2 Model Fitting 

The models Eq. (4.2.8) and Eq. (4.2.10) which have been 
50 selected in the previous section for detection of structural of 
abnormalities in the time series Eq. (4.2.6) have the follow- 
ing parameters to be found from Eq. (4.2.6): the function F, 
the time delay t, the order of time delays m, the powers m 1 
55 and m 2 of the difference (1-B) ml and the seasonal difference 
(1-B 5 ) m2 , and the period s of the seasonal operator. 

If the function F is linear, the simplest approach to model 
fitting is the Yule -Walker equations, which define the auto 
regressive parameters a,- in Eq. (4.2.3) via the autocorrela- 
60 tions in Eq. (4.2.6). The auto regressive (AR) model has the 
form: 

?(*) = y a k y(t - kr) + w(r) (4.2.1 1) 

65 fei 
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where w,-(t) is uncorrelated white noise, and the AR 
coefficients, a-, can be determined by the Yule -Walker equa- 
tions: 


Pa=«iP*-1+«2Pa- 2+ ■ ■ ■ +«*P*- m 

where p* is the autocorrelation function: 


(4.2.12) 5 


(y(kt) - + kr) - Ji) 


Pk ■■ 


z 


iy{t)-ji) 2 Z (y(t + kr)-ji) 2 


(4.2.13) 

10 


and jii is the time average of y(t). 

In many cases the assumption about linearity of the 
underlying dynamical system leads to poor model fitting. It 
is sometimes more practical to assume from the beginning 
that F is a nonlinear (and still unknown) function. In this 2 o 
case, the best tool for model fitting may be a feed-forward 
neural network that approximates the true extrapolation 
mapping by a function parameterized by the synaptic 
weights and thresholds of the network. It is known that any 
continuous function can be approximated by a feed-forward 2 5 
neural net with only one hidden layer. For this reason, in this 
work a feed-forward neural net with one hidden layer is 
selected for the model of Eq. (4.2.8) fitting. The model of 
Eq. (4.2.8) is sought in the following form: 



w jk yi(t-kT) 


30 

(4.2.14) 


where W l7 - and w jk are constant synaptic weights, a(x)=tan 
h(x) is the sigmoid function, and y t -(t) is a function which is 
supposed to approximate the stationarized time series Eq. 
(4.2.9) transformed from the original time series. 

The model fitting procedure is based upon minimization 40 
of the error measure: 


E(W lj w jk ) -- 


^ ^ y ? (0 - 0-j ^ WijO- yi {t - kT) 


2 (4.2.15) 
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where y/(t) are the values of the time series Eq. (4.2.9). The 
error measure Eq. (4.2.15) consists of two parts: 

E=E ± +E 2 (4.2.16) 50 

where E 1 represents the contribution of a physical noise, 
while E 2 results from non-op timal choice of the parameters 
of the model of Eq. (4.2.14). 

There are two basic sources of random components E ± in 55 
time series. The first source is chaotic instability of the 
underlying dynamical system. In principle, this component 
of E 1 is associated with instability of the underlying model, 
and it can be represented based upon the stabilization 
principle introduced by Zak, M, in a paper entitled “Postin- 60 
stability Model in Dynamics,” Int. J. of Theoretical Physics 
1994, Vol. 33, No. 11. The second source is physical noise, 
imprecision of the measurements, human factors such as 
multi-choice decisions in economical or social systems or 
driving habits in case of the catalytic converter of a car, etc. 65 

This second component of E 1 cannot be represented by 
any model based upon classical dynamics, including Eq. 


(4.2.8). However, there are models based upon a special type 
of dynamics (called terminal, or non-Lipschitz-dynamics) 
that can simulate this component. In the simplest case one 
can assume that E a represents a variance of a mean zero 
Gaussian noise. 

The component E 2 , in principle, can be eliminated by 
formal minimization of the error measure Eq. (4.2.15) with 
respect to the parameters W 2 -, w^, t, m, m 1? m 2 , and s. 

Since there is an explicit analytical dependence between 
E and W ly -, w fk , the first part of minimization can be 
performed by applying back-propagation. However, further 
minimization should include more sophisticated versions of 
gradient descent since the dependence E(t, m, m 1? m 2 , s) is 
too complex to be treated analytically. 

4.2.3 Structural Abnormality Criterion 

As noted in the introduction, there are two causes of 
abnormal behavior in the solution to Eq. (4.2.14): 

1. Changes in external forces or initial conditions (these 
changes can be measured by Lyapunov stability and 
associated with operational abnormalities). 

2. Changes in the parameters a,-, W ly -, and w Jk , i.e., changes 
in the structure of the function F in Eq. (4.2.8). These 
changes are measured by structural stability and asso- 
ciated with structural abnormalities. They can be linked 
to the theory of catastrophe. 

In this section we introduce the following measure for 
structural abnormalities: 



j = i 


where 


are the nominal, or “healthy” values of the parameters, and 
a ■ are their current values. Thus, if 

£=0, or t=|e| (4.2.18) 

where e is sufficiently small, then there are no structural 
abnormalities. The advantage of this criterion is in its 
simplicity. It can be periodically updated, and therefore, the 
structural “health” of the process can be easily monitored. 

The only limitation of this criterion is that it does not 
specify a particular cause of an abnormal behavior. 
Obviously, this limitation can be removed by monitoring 
each parameter a • separately. 

4.2.4 Nominal Confidence Intervals 

In the previous section, the state of the underlying 
dynamical system generating sensor data was defined by the 
dynamical invariants, a,-, W 4 -, w Jk , i.e., auto -regressive coef- 
ficients and neural network weights. These invariants were 
calculated during a selected training period over N values of 
the original time series. We will associate this period with a 
short-term training. Obviously, during the period all the 
invariants a,- are constants. 

In order to introduce a long-term training period, we 
return to the original data: 

x=x(Q, i=0,l, ... etc (4.2.19) 

and consider them within the interval shifted forward by y 
points: 
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x 1 = x - 1 {t i ), i=q,q+ 1, . . . etc, 


x 2 =x 2 (ti), i=2q,2q+l, . . . etc, 


x p =x p ( td, i=Pq,Pq+l ■ ■ ■ etc, (4.2.20) 5 

where p is the number of q-shifts. 

For each series of data Eq. (4.2.20) one can compute the 
same invariants a t - by applying the same sequence of algo- 
rithms as those applied to the original data Eq. (4.2.19). In 10 
general, even in case of absence of any abnormalities, the 
values for a t - for different p are different because of mea- 
surement and computational errors, so that a,- will occur as 
series of p: 

15 

a i =a i(p)> . . . p. (4.2.21) 

Since p is proportional to time: 


p~qAt, 


(4.2.22) 


20 


where At is the sampling time interval, Eq. (4.2.21) actually 
represents another time series as shown in FIG. 4, and 
therefore, it can be treated in the same way as the original 
time series Eq. (4.2.19). However, such a treatment applied 
to each invariant as is very costly, and for most practical 
cases unnecessary. Indeed, since all gross nonstationarities 
(in the forum of the least-square polynomial trend and 
dominating harmonies) were already identified, it is very 
unlikely that the time series Eq. (4.2.21) contains any 
additional non-stationaries. Besides, since these invariants 
are associated with certain physical properties of the under- 
lying dynamical system, their deviations from the original 
constant values can be interpreted as a result of errors in 
measurements and computations. Thus a simple statistical 
analysis of the time series Eq. (4.2.21) will be sufficient for 
defining the nominal confidence intervals. 

In order to perform the statistical analysis for the time 
series Eq. (4.2.21), one could generate a histogram. 
However, in many practical cases, such a histogram may 
exhibit a non-Gaussian distribution, which makes it harder 
to define the confidence intervals. Hence it is more conve- 
nient to apply a “bootstrapping” approach as a preprocessing 
procedure in the following way: 

1. Choose randomly P samples of data from Eq. (4.2.21) 
with the same sample size n~p/2: 
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, (i) a (i) 

P U pl > U p2 > 


( 1 ) 




( P ) 


2. Find the sample means: 



(4.2.23) 50 

(4.2 .24) 
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The bootstrapping procedure guarantees that the distribu- 
tion of the means a ® will be Gaussian even if the original 
distribution was non-Gaussian, and simple Gaussian nomi- 
nal confidence intervals can be constructed. As an example, 65 
nominal confidence intervals which include -68%, -95% 
and -99.73% of all the sample means, a are: 


u a _(/) 

Pa - —j= < ay <p a + —= 

yn yn 


~ 0~a „(|) _ 0~a , 

Pa ~ 2-= < ay <p a + 2—, and 
yn yn 


- o~a „(,) _ er a 

Pa - 3— = < ay < Ha +3 — 
yn yn 


(4.2.25) 

(4.2.26) 

(4.2.27) 


respectively, where ju a and o a are the mean and standard 
deviation of the sample’s mean distribution. 

4.2.5 Implementation Architecture 

The implementation architecture of the Dynamical Invari- 
ant Anomaly Detector is shown in FIG. 5. 

After the sensor data are stationarized, they are fed into 
the Memory Buffer 502, which keeps a time history of the 
sensor data for analysis as requested by the Critical Signal 
Selector 504. The Critical Signal Selector will then pass the 
relevant sensor data to the Yule-Walker Parameter Estimator 
506. There, the dynamical invariants a,- and the AR coeffi- 
cients of the time-delay equation are computed using Yule- 
Walker method as shown in Eq. (4.2.12). The Critical Signal 
Selector module is used where the system is processing 
resources are limited so that processing of all available 
signals is untenable. For such applications, the critical 
signals are a combination of: (1) key signals identified 
during the design phase, especially those that have low 
redundancy; and (2) signals implicated by the other 
components, namely bad signals from the Coherence -based 
detector 206 and signals key to the present operating mode 
as sensed by the Symbolic Data Model 204. 

Once the coefficients are computed within the Yule- 
Walker Parameter Estimator, they will be compared to the 
ones stored in the Model Parameter Database 510. This 
contains a set of nominal time -delay equation coefficients 
appropriate for particular operating mode. A statistical com- 
parison 508 will be made between the stored and just- 
computed AR coefficients using the bootstrapping method as 
outlined in section 4.2.4, and if a discrepancy is detected, the 
identity of the offending sensor will be sent to the Channel 
Anomaly Detector 516. 

Further analysis is carried out on the residual or the 
difference between the sensor data values and the model 
predicted values, i.e. the uncorrelated noise, by the Nonlin- 
ear Component Neural Classifier 512 and Noise Analysis 
514 modules. The Nonlinear Component Neural Classifier is 
designed to extract the nonlinear components, which may be 
missed by the linear Yule -Walker Parameter Estimator. The 
weights of the artificial neural network (Eq. 4.2.14), another 
set of dynamical invariants, will be computed and compared 
with nominal weights stored in the Model Parameter Data- 
base. Similarly, the noise characteristics, such as the 
moments of probability distribution, are dynamic invariants 
for stationarized sensor data, and will be compared with 
those stored in the Model Parameter Database 510. If any 
anomalies are detected in either the nonlinear components or 
the noise, the identity of the sensor will be sent to the 
Channel Anomaly Detector. 

Finally, the Channel Anomaly Detector 516 will aggre- 
gate information from the Yule-Walker Parameter Estimator, 
Nonlinear Component Neural Classifier, and Noise Analysis 
modules, and classify the anomaly before sending the fault 
information to the Predictive Comparison module, which is 
discussed in section 4.4 below. The classification is kind of 
a checklist. In other words, the Dynamical Invariant 
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Anomaly Detector 208 effective asks a series of questions — 
Do we see a glitch in the linear component (yes/no)? Do we 
see a glitch in the nonlinear component? Do we see a glitch 
in the noise? If the answer to all of these is no, there is no 
fault or the fault is a feedback effect caused by a different 
signal. If the answer to the noise question is yes but others 
are no, the fault is caused by a degradation in the sensor, and 
so on. 

4.3 Model Filter — Gray Box Method of Sensor 
Data Analysis 

While the model filter component 202 occurs first in the 
data flow shown in FIG. 2, it was more useful to consider it 
after having first discussed the Dynamical Invariant Detector 
208 described above in section 4.2. The Dynamical Invariant 
Anomaly Detector represents a “Black Box” strategy, where 
nothing is known about the internal governing equations of 
a system. On the other hand, as will be explained below, the 
model filter represents a “Gray Box” because there is only 
a partial understanding of the process(es) being modeled; 
i.e., we only have a partial physical model of the process in 
question. 

Such a linear approach is effective and general, but there 
are profound benefits to simultaneous consideration of sen- 
sor and physical information, as we will explore in this 
section. We will make frequent reference to section 4.2. 

Sensor data is one of the most reliable sources of infor- 
mation concerning the state and operational performance of 
the underlying system. When this data is presented in the 
form of a time series, two important results can be obtained: 
prediction of future time series values from current and past 
values, and detection of abnormal behavior of the underly- 
ing system. In most existing methodologies, a solution to the 
second problem is based upon the solution to the first one: 
abnormal behavior is detected by analysis of the difference 
between the recorded and the predicted values of the time 
series. Such analysis is usually based upon comparison of 
certain patterns, such as the average value, the average 
slope, the average noise level, the period and phase of 
oscillations, and the frequency spectrum. Although all of 
these characteristics may have physical meaning and carry 
information, they can change sharply in time when a time 
series describes the evolution of a dynamical system, such as 
the power system of a spacecraft or the solar pressure. 
Indeed, in the last case, a time series does not transmit any 
“man-made” message, and so it may have different dynami- 
cal invariants. In other words, it is reasonable to assume that 
when a time series is describing the evolution of a dynamical 
system, its invariants can be represented by the coefficients 
of the differential (or the time-delay) equation which simu- 
lates the dynamical process. Based upon this idea, we have 
developed the following strategy for detection of abnormali- 
ties: a) build a dynamical model that simulates a given time 
series, and then b) develop dynamical invariants whose 
change manifests structural abnormalities. 

It should be noted that there are two types of abnormal 
behavior of dynamical systems, namely operational and 
structural. Operational abnormalities can be caused by unex- 
pected changes in initial conditions or in external forces, but 
the dynamical model itself, in this case, remains the same. 
In mathematical terms, operational abnormalities are 
described by changes in non-stationary components of the 
time series. 

It is important to emphasize that operational and structural 
abnormalities can occur independently, i.e. operational 
abnormalities may not change the parameters of the struc- 
ture invariants, and vice-versa. 


30 

4.3.1 Gray Box Approach 

Further examples and clarifying details of this aspect of 
the invention can be found in Appendix B attached herewith. 
5 As discussed, the methodology described in section 4.2 
can be termed a black-box approach since it does not require 
any preliminary knowledge about the source of the sensor 
data. Such an approach can be justified for developing 
dynamical models of systems whose behavior can be iden- 
titled only from their output signals. However, to fully assess 
system health, the diagnostic system must have comprehen- 
sive ability to sense not failures, but impending failures, and 
operational difficulties. While fixed thresholds, i.e., tradi- 
tional redlines, may be sufficient for simple steady-state 
5 systems, more sophisticated diagnosis techniques are needed 
for unsteady operations and detection of incipient faults. 

The natural starting point for a more sophisticated diag- 
nosis is the model of the system. Fortunately, many systems, 
such as aircraft, spacecraft, gas turbine engines, hydraulic 
20 systems, etc., usually have well-developed dynamic models. 
The most straightforward application of the model for 
diagnosis is to compare the observed sensor readings to 
those predicted by a model. If the difference between the 
observed and the predicted values, i.e., residual, is greater 
25 than some set threshold value, an anomaly has occurred. In 
practice, however, it is not straightforward to compare the 
observed and predicted values because the quality of the 
model may vary and noise may be present. If the model is 
inaccurate or has insufficient detail, the predicted values may 
30 differ greatly from those of the actual system. Some devia- 
tions are unavoidable since there is no theoretical descrip- 
tion for the phenomenon. For example, secondary effects 
such as friction, thermal effects, sensor noise, etc., may not 
have simple model descriptions. In other cases, the model 
35 can be purposely coarse, i.e., has insufficient detail, to 
facilitate real-time computations. 

In an effort to mitigate the problem of comparing 
observed and predicted values, many different approaches 
have been developed to generate robust residuals and/or 
40 thresholds for anomalies. These methods include adaptive 
threshold methods, observer-based approaches, parity rela- 
tion methods, parameter estimation methods, and statistical 
testing methods. 

In adaptive threshold methods, the threshold on the dif- 
45 ference between the observed and predicted value is varied 
continuously as function of time. The method is passive in 
the sense that no effort is made to design a robust residual. 
The UIO (unknown input observer) and parity relation 
methods are active since the residual is made to be robust to 
50 known disturbances and modeling errors. The residual is 
sensitive to only unknown disturbances that are likely to be 
anomalies or faults in the system. The drawback of these 
methods is that the structure of the input disturbances and 
modeling error must be known. In addition, the methods are 
55 applicable to mostly linear systems. The parameter estima- 
tion methods use system identification technique to identify 
changes in the model parameters of the dynamical system. 
The advantage of this method is that the implementation is 
straightforward, and it can deal with non-linear systems. The 
60 disadvantage is that a large amount of computational power 
may be required to estimate all of the parameters in the 
model. Finally, statistical testing methods use statistical 
techniques such as weighted sum-squared residual (WSSR), 
x 2 testing, sequential probability ratio testing (SPRT), gen- 
65 eralized likelihood ratio (GLR), etc., to differentiate between 
normal noise and anomalous sensor values. The disadvan- 
tage of this method is that the residual is assumed to be a 
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zero-mean white noise process with known covariance 
matrix. The residual in many cases may not be describable 
in this manner. 

In effort to improve model-based fault diagnosis, we 
propose a new approach called the gray-box method. It is 5 
called a “gray-box” because it incorporates both a “black- 
box,” i.e., stochastic model and a “white-box,” i.e., deter- 
ministic model. It is a hybrid model incorporating elements 
from residual based methods and parametric estimation 
methods. It is similar to an adaptive threshold methods in 1Q 
that a residual is generated without any regard for robust 
residual generation. However, instead examining the ampli- 
tude of the residual as in the case of the adaptive threshold 
methods, the structure, i.e. model parameters, of the residual 
is examined. The residual generation is our “white-box.” 
The residual is modeled using techniques similar to the 15 
parametric estimation methods. The method is distinct from 
standard parametric estimation method in that the system 
identification is carried out on the residual, not the system 
variables directly. The residual is parameterized, not the full 
system. In our terminology, the parameter estimation 20 
method is a “black-box.” 

A high-level generalized block diagram of the gray-box 
method for the model filter 202 (FIG. 2) is shown in FIG. 

7A. The physical system is represented by box 702. After 
filtering the deterministic components using the model of the 25 
system, the residual is separated into its linear, non-linear, 
and noise components and is fitted to stochastic models. The 
parameters to the linear, non-linear, and noise models 716 
completely describe the residual. The gray-box has several 
advantages. First, the full model is employed rather than 
only the model structure as in the case of standard paramet- 
ric estimation methods. Thus the gray-box takes full advan- 
tage of the information about the system. Second, the 
gray-box method can be made robust to modeling errors 
which can be taken care of during residual modeling. The 
model of the residual can also describe many unmodeled 
phenomena in the original model. Finally, the method is 
applicable to both linear and non-linear systems. 

4.3. 1.1 Residual Generation 40 

Any theoretical dynamical model includes two types of 
components: those directly describing the phenomena asso- 
ciated with the primary function of the system (such as the 
effect of torque exerted on a turbine shaft on rotor speed), 
and those representing secondary effects (such as frictional 45 
losses, heat losses, etc.). The first type of components is 
usually well understood and possesses a deterministic ana- 
lytical structure, and therefore its behavior is fully predict- 
able. On the other hand, the second type may be understood 
only at a much more complex level of description (including 50 
molecular level) and cannot be simply incorporated into a 
theoretical model. In fact, some components may be poorly 
understood and lack any analytical description, e.g. viscosity 
of water in microgravity. Thus, in accordance with the 
present invention, we filter out contributions that are theo- 55 
retically predictable from the sensor data (i.e., the compo- 
nents of the first type), and focus on the components of the 
second type whose theoretical prediction is lacking. As will 
be seen, the filtering will be performed by the theoretical 
model itself. 60 

The residual generation is as follows. Let us assume that 
the theoretical model is represented by a system of differ- 
ential equations: 

m =Ax(t),u(t))+y(t), (4-3.1) 65 

where x(t) is the state variable vector, u(t) is the known 
input, and f is the known theoretical relationship following 


from conservation laws of mechanics, thermodynamics, etc. 
The last term, y(t), represent components which lack theo- 
retical descriptions, are too complex, or are the result of 
modeling errors. These can include sensor noise, unknown 
input to the system, friction in bearings, material viscosity, 
and other secondary effects such as torsional oscillations of 
the shaft, flutter in the turbine and compressor blades, 
incomplete fuel combustion, etc. 

The estimate of the system is accomplished by substitut- 
ing the observed sensor data for the evolution of the state 
variables, x*(t), and input, u(t). Hence: 

x*{t)=f{x*{t),u{t)). (4.3.2) 

The residual, 

r(t)=x{i)-x(t), (4.3.3) 

is generated by subtracting the solution of Eq. (4.3.2), x*(t), 
which is generated by using the observed state variables, 
x*(t), from the solution of Eq. (4.3.1). Hence the original 
theoretical model is the filter. 

In general, the residual can be treated as another realiza- 
tion of some stochastic process. If the theoretical model of 
Eq. (4.3.1) is accurate, accounts for most physical effects, 
and the observed state variables are accurate, then the 
residual, |r(t)|, will be very small, i.e., 

K0|«|**(0l, (4.3.4) 

and either a fixed or an adaptive threshold can be assigned 
as criteria for anomalous behavior. If the system is linear, 
and the structure of y(t) is known, a more sophisticated UIO 
(unknown input observer) filter can be constructed to make 
the residual more robust modeling errors and disturbances. 
However, in our gray -box approach, the simple form of Eq. 
(4.3.3) is preferred over the more robust residuals since the 
residual is to be modeled. If the residual is too robust, the 
characteristic structure of y(t) will become hidden. 

Merely as an example to illustrate the foregoing, consider 
the simplest gas turbine plant consisting of a turbine 1, a 
compressor 2, and a combustion chamber 3, as shown in 
FIG. 6. Ignoring the thermal inertia of the combustion 
camber, one can write the following dynamic equation for 
the angular velocity, 00 , of the shaft. 

do) (4.3 .5) 

J — r — = Mi(fc>, u) - M 2 (w) - M r (t ) 
ot 


where J is the moment of inertia of the turbo -compressor 
(1-2) in the axis of rotation, M 1 is the turning moment 
generated by the turbine, M 2 is the resistive moment applied 
by the compressor, bearings, etc., on the shaft, // is the rate 
of fuel burned inside the combustion camber, and M r (t) is 
the random moment applied by effects such as torsional 
vibration of the shaft, blade flutter in the compressor and 
turbine, propagation of pressure pulses along the pipelines, 
heat loss, seal leaks, etc. 

The conditions for stability of the solutions of Eq. (4.3.5) 
are: 


d 

d(i) 


dM 2 d 

— ->0;i.e., — (M l -M 2 )<0. 
0(l> ooj 


(4.3.6) 


Consequently, if one linearizes Eq. (4.3.5) with respect to 
a stationary regime where the rate of fuel burn is constant, 
i.e., 


ji=pL 0 =const. 


(4.3.7) 
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Eq. (4.3.5) can be reduced to the form: 


(o=-Y(o+r(r), 


where 


1 ft) 

7= j[ da> 


3M 2 (w)‘ 

da) \n=n 0 


> 0, and 



(4.3.8) 


(4.3.9) 

(4.3.10) 


T(t) represents a stochastic force, and Eq. (4.3.8) is a 
Langevin equation whose formal solution is: 

r / (4 3 11) 

e-*" ) r(t')dt' 


stochastic process which can be described only in terms of 
probability distributions. However, any information about 
this distribution can not be obtained from a simple realiza- 
tion of a stochastic process unless this process is stationary. 
5 Then the ensemble average can be replaced by the time 
average. An assumption about the stationarity of the under- 
lying stochastic process would exclude from consideration 
such important components of the dynamical process as 
linear and polynomial trends, or harmonic oscillations. Thus 
10 a method is needed to deal with non-stationary processes. 

Our approach to building a dynamical model of the 
residual is based upon progress in three independent fields: 
nonlinear dynamics, theory of stochastic processes, and 
artificial neural networks. From the field of nonlinear 
15 dynamics, based upon the Takens theorem, any dynamical 
system which converges to an attractor of a lower (than 
original) dimensionality can be simulated with a prescribed 
accuracy by the time-delay equation: 


subject to the initial condition: 


20 x{i)=F(x(t-x)pc(t-2x), . . . pc(t-mx)). 


(4.3.15) 


( 0 =( 0 o at f=0. (4.3.12) 

This solution is the only information that can be obtained 
from the sensor data. The first term in Eq. (4.3.11) is fully 
deterministic and represents all of the theoretical knowledge 
about the plant. The second term includes the stochastic 
force (Eq. 4.3.10) and is stochastic. Hence, the stochastic 
process described by Eq. (4.3.11) represents only a part of 
the sensor data. 

Using a black box approach, a complex preprocessing 
procedure is required to extract the stochastic force T(t) from 
the sensor data which is the solution of Eq. (4.3.112). 
However, the proposed gray box approach eliminates this 
preprocessing. Substituting the sensor data Eq. (4.3.11) into 
the theoretical model of Eq. (4.3.8), the original stochastic 
force is immediately exposed as the inverse solution: 

r(r)=ou*+Y(D*, (4.3.13) 

where oo* is the sensor data. 

Eq. (4.3.11) shows that the more stable the model, i.e., the 
larger the value of /*, the less the stochastic force T(t), 
contributes to the sensor data, since: 

at t>t ' (4.3.14) 

In other words, for highly stable dynamical models, the 
black box approach is not only more laborious, but is also 
less effective since the stochastic forces become deeply 
hidden in the sensor data. For the gray box approach, on the 
other hand, the theoretical model acts as a filter, which 
damps the deterministic components and amplifies the sto- 
chastic components. It is noted that Eq. (4.3.13) represents 
only one sample of the sought stochastic force, T*(t). A 
black box approach still needs to be applied to T*(t) in order 
to reconstruct all of its stochastic properties. 

4.3. 1.2 Residual Modeling 

For the model of the residual, we start with a traditional 
description of sensor data given in the form of a time series 
which describes the evolution of an underlying dynamical 
system. It will be assumed that this time series can not be 
approximated by a simple analytical expression and is not 
periodic. In another words, for an observer, the future values 
of the time series are not fully correlated with the past ones, 
and therefore, they are apprehended as random. Such time 
series can be considered as a realization of an underlying 


where x(t) is a given time series, such as a variable in the 
residual vector, r(t), and t is the constant is the time delay. 

It has been shown that the solution to Eq. (4.3.15), subject 
to appropriate initial conditions, converges to the original 
time series: 

x(f)=x(fi)^ 2 ), . . . , (4.3.16) 

if m in Eq. (4.3.15) is sufficiently large. 

However, the function F, as well as the constant t and m, 
are not specified by this theorem, and the most “damaging” 
limitation of the model of Eq. (4.3.15) is that the original 
time series must be stationary since it represents an attractor. 
This means that for non-stationary time series the solution to 
Eq. (4.3.15) may not converge to Eq. (4.3.16) at all. Actually 
this limitation has deeper roots and is linked to the problem 
of stability of the model of Eq. (4.3.15). 

A discrete-time stochastic process can be approximated 
by a linear autoregressive model: 

x(t)=a 1 x(t-l)+ajc(t-2)+ . . . a n (t-ri)+z{t ) an oo, (4.3.17) 

where a,- are constants, and z(t) represents the contribution 
from white noise. 

It can be shown that any zero-mean purely non- 
deterministic stationary process x(t) possesses a linear rep- 
resentation as in Eq. (4.3.17) with 

Z “j < 

j= i 

i.e., the condition of the stationarity. 

In order to apply Eq. (4.3.17), the time series Eq. (4.3.16) 
must be decomposed into its stationary and non-stationary 
components. To “stationarize” the original time series, cer- 
tain transformations of Eq. (4.3.16) are required. These 
types of transformations follow from the fact that the 
conditions of stationarity of the solution to Eq. (4.3.17) 
coincide with the conditions of its stability, i.e., the process 
is non-stationary when 

\G^l, (4.3.18) 

where G a are the roots of the characteristic equation asso- 
ciated with Eq. (4.3.17). 

The case |G 1 |^1 is usually excluded from considerations 
since it corresponds to an exponential instability which is 
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unrealistic in physical systems under observation. However, 
the case |G 1 |=1 is realistic. Real and complex conjugates of 
G-l incorporate trend and seasonal (periodic) components, 
respectively, into the time series Eq. (4.3.16). 

By applying a difference operator: 5 

\7x(t)=x(t)-x(t-l)=(l-B)x(t), (4.3.19) 

where B is defined as the backward shift operator, as many 
times as required, one can eliminate the trend from the time 
series: 

x{t),x{t-l)jc{t-2), (4.3.20) 

Similarly, the seasonal components from the time series 
Eq. (4.3.20) can be eliminated by applying the seasonal 15 
difference operator: 

V^(0Kl-^)*(0=*(0-*('-4 (4.3.21) 

In most cases, the seasonal differencing Eq. (4.3.21) 
should be applied prior to the standard differencing Eq. 
(4.3.19). 

Unfortunately, it is not known in advance how many times 
the operators Eq. (4.3.19) or Eq. (4.3.21) should be applied 
to the original time series Eq. (4.3.20) for their stationar- 
ization. Moreover, in Eq. (4.3.21) the period s of the 
seasonal difference operator is also not prescribed. However, 
several methods are known to estimate the order of differ- 
entiation. One simple estimate of the number of operations 
for Eq. (4.3.20) is minimizing the area under the autocor- ^ 
relation curve. 

Once the time series Eq. (4.3.20) is stationarized, one can 
apply to them the model of Eq. (4.3.15): 


4.3. 1.3 Model Fitting 

The models Eq. (4.3.22) and Eq. (4.3.24) which have been 
selected in the previous section for detection of structural of 
abnormalities in the time series Eq. (4.3.20), have the 
following parameters to be found from Eq. (4.3.20): the 
function, F, the time delay, t, the order of time delays, m, the 
powers, m 1 , and m 2 , of the difference (1-B) ml and the 
seasonal difference (l-B^™ 2 , and the period s of the sea- 
sonal operator. 

The form of the function F we’ve selected for the residual 
is shown in FIG. 7B. After stationarization, the linear 
component is fit using the Yule-Walker Equations which 
define the auto regressive parameters a,- in Eq. (4.3.17) via 
the autocorrelations in Eq. (4.3.20). If sufficient, the residual 
left after removal of the linear component, w(t), can be 
directly analyzed and modeled as noise. 

If the linear model of the residual leads to poor model 
fitting, the best tool for fitting the non-linear component of 
the residual may be a feed-forward neural network which 
approximates the true extrapolation mapping by a function 
parameterized by the synaptic weights and thresholds of the 
network. It is known that any continuous function can be 
approximated by a feed-forward neural net with only one 
hidden layer, and thus is selected for fitting the non-linear 
component after the linear component is removed using 
equation Eq. (4.3.17). Hence w(t) is sought in the following 
standard form of time-delay feed-forward network: 



(4.3.25) 


y(t)=F 


where 


y{t),y{t-l), . . . ; (y(t)=x(t)-x(t-l)) 


(4.3.22) 


(4.3.23) 


35 where W l7 - and w fk are constant synaptic weights, a(x)=tanh 
(x) is the sigmoid function. 

The model fitting procedure is based upon minimization 
of the mean standard error: 


are transformed series Eq. (4.3.20), and t=1. After fitting the 
model of Eq. (4.3.22) to the time series Eq. (4.3.20), one can 
return to the old variable x(t) by exploiting the inverse 
operators (1-B) -1 and (l-B^ -1 . For instance, if the station- 
arization procedure is performed by the operator Eq. 
(4.3.19), then: 


E(W lj ,w jk ) = 


Z 


z(t - i ) - o-< 


v — \ m 

/ w ij X Wjk z{ * ~ kr ~ ® 

< t b-\ 

j= i 1 \ 


(4.3.26) 


x(r)=x(r-l) + f([x(r-l)-x(r-2)],[x(r-2)-x(r-3)l . . . ). (4.3.24) 


The error measure Eq. (4.3.26) consists of two parts: 


Eq. (4.3.24) can be utilized for modeling the residual, 
predictions of future values of Eq. (4.3.20), as well as for 
detection’s of structural abnormalities. However, despite the 
fact that Eqs. Eq. (4.3.22) and Eq. (4.3.24) may be signifi- 
cantly different, their structure is uniquely defined by the 
same function F. Therefore, structural abnormalities which 
cause changes of the function F, can also be detected from 
Eq. (4.3.22) and consequently for that particular purpose the 
transition to Eq. (4.3.24) is not necessary. 

It should be noted that strictly speaking, the application of 
the stationarization procedure Eq. (4.3.19) and Eq. (4.3.21) 
to the time series Eq. (4.3.20) are justified only if the 
underlying model is linear since the criteria of stationarity 
for nonlinear equations are more complex than for linear 
ones in the same way as the criteria of stability are. 
Nevertheless, there are numerical evidences that even in 
nonlinear cases, the procedures Eq. (4.3.19) and Eq. (4.3.21) 
are useful in a sense that they significantly reduce the error, 
i.e., the difference between the simulated and the recorded 
data if the latter are non-stationary. 


E=E X +E 2 , (4.3.27) 

where E 1 represents the contribution of a physical noise, 
while E 2 results from non-op timal choice of the parameters 
of the model of Eq. (4.3.25). 

There are two basic sources of random components E 1 in 
time series. The first source is chaotic instability of the 
underlying dynamical system; in principle, this component 
of E-l is associated with instability of the underlying model, 
and it can be represented based upon known and understood 
stabilization principles. The second source is physical noise, 
imprecision of the measurements, or human factor such as 
multi-choice decisions in economical or social systems, or 
the driving habits in case of the catalytic converter of a car, 
etc. 

The last component of E a cannot be presented by any 
model based upon classical dynamics, including Eq. 
(4.3.22). However, there are models based upon a special 
type of dynamics called terminal, or non-Lipschitz- 
dynamics, which can simulate this component. In the sim- 
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plest case one can assume that E 1 represents a variance of a 
mean zero Gaussian noise. 

The component E 2 , in principle, can be eliminated by 
formal minimization of the error measure Eq. (4.3.26) with 
respect to the parameters W 1; -, w Jk , x, m, m l9 m 2 , and s. 

Since there is an explicit analytical dependence between 
E and W 2 -, w jk , the first part of minimization can be 
performed by applying back-propagation. However, further 
minimization should include more sophisticated versions of 
gradient descent since the dependence E(t, m, m 1? m 2 , s) is 
too complex to be treated analytically. 

4.3. 1.4 Anomaly Detection 

As discussed in the previous section, there are two causes 
for abnormal behavior in the solution to Eq. (4.3.25): 1) 
Changes in external forces or initial conditions (these 
changes can be measured by Lyapunov stability and asso- 
ciated with operational abnormalities). 2) Changes in the 
parameters W 1; -, w Jk , i.e., changes in the structure of the 
function F in Eq. (4.3.22). (These changes are measured by 
structural stability and associated with structural abnormali- 
ties. They can be linked to the theory of catastrophe). 

The measure we use for anomaly detection in the non- 
linear component is: 

v-i |7 ° / ° \^1 (4.3.28) 

where W l7 - and w Jk , are the nominal, or “healthy” values of 
the parameters, and W ly -, w Jk , are their current values. If 

SH4 (4-3.29) 
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randomness is introduced into the forecast. Such random- 
ness is incorporated into the underlying dynamical model by 
considering the time series for t^O as a realization of some 
(unknown) stochastic process. Then the future values for t>0 
can be presented in the form of an ensemble of possible time 
series, each with a certain probability (FIG. 8). After aver- 
aging the time series over the ensemble, one can represent 
the forecast in the form shown in FIG. 9, i.e., as the mean 
value of the predicted data and the probability density 
distributions. 

The methodology of time series forecast is closely related 
to those of model fitting and identification, as discussed in 
the Dynamical Invariant Anomaly Detector section above. 
In general, the non-stationary nature of many sensor data 
may lead to misleading results for future data prediction if 
a simple least-square approach to polynomial trend and 
dominating harmonics is adopted. The correct approach is to 
apply inverse operators (specifically difference and seasonal 
difference operators) to the stationary component of the time 
series and forecast using past values of the time series. 

To illustrate this methodology, start with a time series: 

x=x(t i ),i=Q,l, . . . , N (4.4.1) 

and assume, for demonstration purposes, that its stationar- 
ization requires one seasonal differencing: 

y =V^=x,-x^ (4.4.2) 

y=y(t l ),i= 0,1, ...,N-s (4.4.3) 

and one simple differencing: 

z t =Vy t =yry t -i (4.4.4) 


where e is sufficiently small, then there is no structural 
abnormalities. The advantage of this criterion is in its 
simplicity. It can be periodically updated, and therefore, the 
structural “health” of the process can be easily monitored. 

Similar criteria can be generated for the parameters of the 
linear component, a 7 -, and the noise component which is 
modeled by the variance or higher moments. Unfortunately, 
there is no general method for setting the threshold, e, other 
than experience and heuristic methods. This is a problem 
faced by all fault diagnosis. 

4.4 Predictive Thresholds (Prognostic Assessment) 

Referring now to FIG. 10 , the prognostic assessment 
component 212 shown in FIG. 2 comprises a channel level 
prognostics algorithm. It is intended to identify trends which 
may lead sensor data values to exceed limit values, such as 
redlines. A stochastic model such as the auto-regressive 
model is used to predict values based upon previous values, 
i.e., forecasting. The aim is to predict, with some nominal 
confidence, if and when the sensor values will exceed its 
critical limit value. This permits warning of an impending 
problem prior to failure. 

4.4.1 Forecasting Time Series 

In general, time -series forecasting is not a deterministic 
procedure. It would be deterministic only if a given time 
series is described by an analytical function, in which case 
the infinite lead time prediction is deterministic and unique 
based upon values of the function and all its time derivatives 
at t=0. In most sensor data, this situation is unrealistic due 
to incomplete description by the analytic model, sensor 
noise, etc. In fact, the values of a time series may be 
uncorrelated with the previous ones, and an element of 


Z=Z(t i ),i=0,l, . . . , N-s- 1. (4.4.5) 

Based upon the methodology discussed in section 4.2, one 
can build a model for Eq. (4.4.5) as: 

Z t =F{Z t _ x , Zj_ 2 , . . . , Z t _ m ) + R t _ v (4.4.6) 

Here F may include a linear component in the form of an 
auto-regressive process, as well as a nonlinear component, 
represented by a neural network formalism. R r _ 1 is a 
sequence of independent random variables characterized by 
a certain variance, a 2 , which does not depend upon time; 
i.e., R f-1 is an error of representation of the time series Eq. 
(4.4.5) by the analytical expression: 

Z t =F{Z t _ x , Zj_ 2 , . . . , Z t _ J. (4.4.7) 

A model for the original time series can be easily derived 
from Eq. (4.4.6) based upon Eqs. Eq. (4.4.4) and Eq. (4.4.2). 
Eq. (4.4.3) becomes: 

y t =y l -i+F(\y t - 1 -y t - 2 l\y t - 2 -y t - 3 l ■ ■ ■ ,\y t - m -y t —i])+R-i (4-48) 

and Eq. (4.4.1) becomes 

X^X t _ i —X t _ s _- l +F(^X t _ i —X t _ s _ i —X t _2+X t _ s _2, . . . ,[x t _ m -x t _ s _ m -x t _ m _ 

(4.4.9) 

Thus, it follows from Eq. (4.4.9) that each new value of 
the time series can be predicted if the past (m+s+1) values 
are known. Hence, if 

N^m+s+l (4.4.10) 

all the necessary values of the variable x are available. 

However, in order to predict all the future values by 
means of successive iterations of Eq. (4.4.9), one has to find 
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the next values of the last term, R t , R f+1 , etc. Formally this 
is not possible since, as follows from Eq. (4.4.7), 


F(Z t £ i, . . . ,Z t _ m+1 ) 


(4.4.11) 


and these values depend on the values, x f , x t+1 , etc., which 
have not yet been measured. This is why the deterministic 
forecast of future data cannot be achieved in general. Turn- 
ing to the probabilistic approach, recall that the time series 
R f are represented by a sequence of an independent random 
variable, whose probability density does not depend upon 
time, since the time dependence of the statistical properties 
of R f was eliminated in the course of stationarization (Eqs. 
Eq. (4.4.2) and Eq. (4.4.4)) while all the correlated compo- 
nents were captured by the mathematical model in Eq. 
(4.4.9). Hence, the only statistical invariants of R f are the 
invariants of its probability density, i.e., the mean, the 
variance, and the higher moments. This means that all the 
values, R f , R f+1 , etc., can be drawn randomly from the time 
series, 


R-05 RlJ R-25 ■ ■ ■ 5 Rf-1 


Obviously, each sample 


R PF t+ W, . . . ,etc, 1=1,2, . . . , P 
will lead to different predicted values for 

x^, x t+ ^, . . . , etc., i= 1,2, . . . , P 


(4.4.13) 


by plotting a histogram for the function: 

x< l W 2 \ ■ • • , 
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(4.4.12) 
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(4.4.14) 
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However, since all of the samples in Eq. (4.4.13) are 
statistically identical, all the predicted values will represent 
different realizations of the same stochastic process forming 
an ensemble. 

For each x t one can find the probability distribution 35 
function: 


(4.4.15) 


40 


(4.4.16) 


Since the original time series Eq. (4.4.1) is non-stationary, 
the probability functions Eq. (4.4.15) will depend upon time 45 
(illustrated in FIG. 7B). Therefore, for each x f one can 
compute the statistical invariants such as the mean, 


(4.4.17) 
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the standard deviation a, 


(4.4.18) 
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as well as higher moments: 

Ml = £ (4° - rf f(4’) 


60 


(4.4.19) 
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Similarly, if one wants to forecast the time, tp when the 
values of the time series will reach a particular value x /? Eq. 


(4.4.9) can be evaluated many times for which tf=t x=x fis 
realized to generate a list of ip. 

h ( % (2) - ■ ■ ’hi (F) (4.4.20) 

from which a probability distribution function, 

f(t f V)=P(X=t f V) (4.4.21) 

can be created to determine the confidence interval or the 
likelihood of reaching t^at certain time, t. 

4.4.2 Implementation Architecture 

The implementation architecture for the Channel Prog- 
nosis module is shown in FIG. 10 below. The Predictor 1002 
is fed stationary data, the auto regressive model coefficients, 
past raw data values, and limit values, i.e., everything 
required to evaluate Eq. (4.4.9) plus a redline value at which 
to stop the computation. The predictor will generate many 
predictions of when the channel value will reach the redline 
limit value and pass them on to the Redline Confidence 
Estimator 1004. The Redline Confidence Estimator will then 
construct a probability distribution of the time when the 
channel value will exceed the redline limit. Finally, the 
Failure Likelihood Estimator 1006 takes the probability 
distribution for t^and computes the likelihood (probability) 
that the channel value may exceed the redline value within 
some critical time X critical . If the probability exceeds a certain 
threshold, e.g., 99% confidence, then the critical time and its 
probability will be sent to the Causal System Model 216, 
which is discussed in section 4.5. 

Implementation of the prognostic assessment component 
214 is relatively easy provided guidelines exist about the 
allowed performance of each parameter. The model coeffi- 
cients are automatically sensed from data, leaving only the 
selection of a threshold — usually determined by control 
functions — to be determined. 

4.5 Symbolic Components 

The present invention is a system comprising a collection 
interrelated components configured to operate in accordance 
with numeric and symbolic algorithms with their combined 
result being a deep understanding of the current and pre- 
dicted health of a system. The invention uses a collection of 
novel numeric and symbolic approaches to synergistically 
perform real-time diagnosis and prognosis. These compo- 
nents in combination have the capability to fuse and simul- 
taneously analyze all system observables and automatically 
abstract system physics and information invariants. 

One of the biggest weaknesses of a purely symbolic 
approach is that they only detect and diagnose predicted 
problems while missing all unmodeled events or incorrectly 
identifying an unmodeled event as a modeled one. The 
numeric algorithms of the present invention are used pri- 
marily to detect unmodeled events whereas the symbolic 
algorithms are used to predict expected behavior, correlate it 
to the unmodeled events and interpret the results. Combin- 
ing these two methodologies makes it possible to correctly 
detect and diagnose modeled and unmodeled events in 
real-time. 

The combination of the two approaches provides an 
ultra-sensitive capability to find degradation and changes in 
a system and isolate those changes in both space and time 
and relate the failure to the physical system being modeled 
with near zero false alarms. 

This section focuses on the symbolic components of the 
system that model expected behavior and diagnose predicted 
faults and fuse the numeric engine results to form an overall 
conclusion to the health or future health of the system. The 
symbolic components consist of the following: 
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1. Symbolic Data Model 

2. Predictive Comparison 

3. Causal System Model 

4. Interpretation Layer 5 

Each of these will be studied individually. 

4.5.1 Method of Approach 

Our approach uses multiple knowledge representations 
for providing an automated capability to predict the current 10 
system state; detect failures; combine the former items with 
detected unmodeled events from the numeric engine; and 
interpret the final results and relate it back to the original 
models. 

The technical approach is to monitor all real-time data and 15 
actual operational stresses imposed upon a system (real-time 
data), model expected behavior and performance 
(engineering models), model causal interrelationships 
between predicted failures and unmodeled events (causal 
system model), detect anomalies (real-time data and 20 
inference) and translate a series of source hypotheses to one 
or more combined and focused hypotheses (see FIG. 11). 

Real-time measurements are combined with predicted and 
expected behavior along with predicted performance to 
quickly isolate candidate faults. The causal system model is 25 
then used to perform a deeper analysis to analyze the 
problem in detail and eliminate incorrect hypotheses. Elimi- 
nation of incorrect hypotheses helps to eliminate incorrect 
conclusions caused by incomplete knowledge. This infor- 
mation is stored and automatically reused to constantly 30 
refine the inference strategies to avoid dead-end paths. 

4.5.2 Present Practice 

Traditional practice for automatic fault detection and 35 
recovery has been a combination of detecting alarm 
conditions, derived from elaborate and cumbersome fault 
trees, and a priori predictions of expected failures based 
upon engineering models or fault estimates. This is a sys- 
tematic and definitive way of understanding health 4Q 
management, which is inefficient. 

The problem with this methodology is that it is designed 
to always assume the worst case in system reliability 
because maintenance is based upon hours-in-use assump- 
tions and unanticipated failures rather than operational 45 
stresses. This is like replacing the engine of a car when it 
reaches 200,000 miles because that is its anticipated 
lifespan. In reality, the lifespan of the engine depends upon 
the environment, how the car was driven, its service record, 

etc - 50 

A more serious problem with this approach has to do with 
predicted diagnostic space coverage. It has been shown that 
only twenty to thirty percent of faults can be predicted in 
advance. This leaves the diagnostic system blind to at least 
seventy percent of the problems that could actually occur in 55 
an operational system. 

The more accurate and cost effective approach is to 
predict failures based upon a combination of physics 
models, numeric analysis, and symbolic analysis techniques, 
including a symbolic data model of the predicted behavior of 60 
the system. This would include all explicit failures and 
predicted states. This information is combined with unmod- 
eled events through relationships between what is known or 
could be related to explicit failures with respect to the 
current state. This provides an efficient transition in the 65 
diagnostic paradigm from assumed failures versus which 
parts must be replaced. The net effect of this approach is to 
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provide all traditional automatic health monitoring 
capabilities, as well as the ability to track minute system 
changes over time, and to detect and predict unmodeled 
system failures. 

The present practice to diagnose and prognose systems is 
to anticipate all one-point and two -point failure modes. 
Elaborate checklists are constructed which, it is hoped, will 
serve to identify all of these failure models and recommend 
their corresponding maintenance actions. 

The inherent problem is that as the complexity of the 
system increases, the resources required to anticipate failure 
modes and construct exhaustive checklists becomes expo- 
nentially expensive. Furthermore, for diagnostic and prog- 
nostic analysis, checklist actions seldom embody the ratio- 
nale for the procedures that are being followed. This can 
make it tedious and difficult for humans who are performing 
the maintenance to focus on the immediate problem at hand. 

These techniques are often insufficient in many 
monitoring, diagnostic or maintenance applications. Control 
and diagnostic mechanisms in these approaches cannot be 
dynamically matched to the exigencies of the situation. They 
are typically inflexible and cannot easily accommodate the 
reconfiguration or modification of the system under test. 
Such systems are usually unresponsive to the varying 
degrees of skill of the different technicians who use them. 
Poor real-time performance is also symptomatic of conven- 
tional automation approaches to diagnosis and maintenance. 

Furthermore, as a maintenance prediction or diagnostic 
tool, checklists seldom embody the rationale for the proce- 
dures that are being followed. This can make it tedious and 
difficult for humans who are performing checklist actions to 
focus on the immediate problem at hand. 

An important consideration in a real-time system is quick 
responses to system failures and maintenance predictions 
may be critical. The ability of human operators and main- 
tenance personnel to compensate for a failure, determine a 
diagnosis with incomplete or partial information and quickly 
institute a recovery or maintenance procedure diminishes as 
system complexity increases. These considerations make the 
ability for long-term wear detection combined with main- 
tenance recommendations very desirable and an excellent 
candidate for automation. 

4.5.3 Technical Overview 

Traditional expert systems for diagnosis utilize only a 
single kind of knowledge representation and one inference 
mechanism. Our approach uses real-time measurements 
along with predicted and expected behavior, past 
performance, and unmodeled event detection to quickly 
isolate candidate faults. Combining these methodologies 
makes it possible to correctly detect and diagnose modeled 
and unmodeled events in real-time. The technical approach 
is as follows: 

1. Monitor all real-time data and actual operational 
stresses imposed upon a system (real-time data). 

2. Model expected behavior and performance 
(engineering models) 

3. Causal interrelationships between from predicted fail- 
ures and unmodeled events (Causal system model) 

4. Detect anomalies (real-time data and inference) 

5. Translate a series of source hypotheses to one or more 
combined and focused hypotheses. 

The causal system model 216 is used to perform a deeper 
analysis, using rules supplied by engineers, to eliminate 
incorrect hypotheses. This information is stored and auto- 
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matically reused to constantly refine the inference strategies 
to avoid dead-end paths. 

Using the symbolic components for diagnosis and mod- 
eling and BEAM for detection and prognostics, it provides 
a complete and unified diagnostic and prognostic system to 5 
diagnose prognose and interpret modeled and unmodeled 
events. This makes multiple tools unnecessary for the 
detection, diagnosis, prognosis and modeling functions. In 
this manner interfacing and representation of information is 
much easier and provides a single, clean and concise solu- 10 
tion. 

Let us examine the particular components in detail indi- 
vidually. 

Symbolic Data Model 15 

The Symbolic Data Model (SDM) 204, illustrated in FIG. 

12 above, is the first line of defense in determining the 
overall health of the system and it is the primary component 
that determines the system’s active states and predicted 
states. It operates by examining the values from status 20 
variables and commands to provide an accurate, evolving 
picture of system mode and requested actions. Since most 
rule-based diagnostic systems (expert systems) provide only 
this module and nothing else, they are limited in that they 
can only identify and diagnose anticipated problems. 25 

Knowledge in the SDM is represented as rules, which are 
composed of patterns. The rule is the first Aristotelian 
syllogism in the form: If . . . Then .... The variables of the 
syllogism are joined by the And/Or logical connections. The 
selector Else points to other cases. This formula is a rule; the 30 
rules are sequenced in the succession of logical thinking or 
pointed at a jump in the sequence (Else— >Go To). 

If Pattern 

Then Action 35 

Patterns are relations that may or may not have temporal 
constraints, i.e., may only hold true at certain times or persist 
for the entire duration of the analysis. Patterns define the 
constraints that must hold true in order for the antecedent to 
succeed. Some examples of relations are: 40 

SNR<60 and SNT>0.68 
SNR<20 While State-“Idle” 
Remaining_Distance*MPG<=Remaining_Fuel 
Conceptual representation is the main way for putting the 
patterns of the system into a computer program. The task of 45 
concatenation of pattern situations, preparation of the ways 
of reasoning, inference is the procedural representation of 
the meta patterns. The essential tool the SDM uses is a rule; 
that is the reason why another name for expert systems is 
rule-based system. 50 

The SDM operates by using many small slivers of knowl- 
edge organized into conditional If-Then rules. These rules 
are then operated on in a variety of different ways to perform 
different reasoning functions. 

Relational representation of traditional logic is occluded 55 
by the Closed World Assumption, which includes only a 
limited number of concepts and relations, and supports the 
hypothesis that the entire problem domain is explorable in a 
well-defined way. Uncertainty methods open some windows 
to the real world of unexplored, unexpected phenomena. 60 
This is especially true for the non-traditional uncertainty 
methods that ignore the hypothesis of excluded middle and 
of independence of basic events. The price of a more 
permissive method is the increased softness of their basic 
model and consequently the inefficiency of reasoning cap a- 65 
bility. From the point of view of logical and mathematical 
rigor, they are less and less welcome. 
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To avoid these obvious pitfalls the SDM uses modal 
operators. Modal operators grew out of the modalities of 
classical logic, i.e. the limitations of the validity to certain 
subjects and interpretation of fields. We accomplish this by 
providing a rich set of quantifiers for the patterns. The 
following modal operators are provided: 


1 . 

It is necessarily true that x 

Alethic logic 

2. 

It is possibly true that x 

Alethic logic 

3. 

It will always be true that x 

Temporal logic 

4. 

It will be sometimes that x 

Temporal logic 

5. 

It ought to be that x 

Deontic logic 

6. 

It can be that x 

Deontic logic 

7. 

It is known that x 

Logics of knowledge 

8. 

The opposite of x is not known 

Logics of knowledge 

9. 

It is believe that that x 

Logics of belief 

10. 

The opposite of x is not believed 

Logic of belief 


Because the numeric models take a certain amount of time 
to ascertain the current health of the system, the SDM is the 
primary defense in diagnosing its instantaneous health. Its 
primary input is the discrete data stream comprising the 
system status variables and system command information. 
This stream contains all the data channels that are necessary 
for the SDM to make an analysis. 

Unlike the numeric models, the SDM requires a knowl- 
edge base in order to perform its analysis functions. From 
several points of view, representation of knowledge is the 
key problem of expert systems and of artificial intelligence 
in general. It is not by chance that one of the favorite 
namings of these products is knowledge -based systems. 

The generic features of knowledge are embodied in 
representation. The domain expert stores the objects, 
actions, concepts, situations and their relations using the 
SHINE representation language (see SHINE description) 
and this is stored in the SDM knowledge base. The collec- 
tion of this knowledge represents the sum total of what the 
SDM will be able to understand. The SDM can only be as 
good as the domain expert that taught it. 

At the front end of the SDM is a Discrete Filter 1202. The 
purpose of the filter is to act as a rate governor to limit the 
amount of data being sent to the SDM at any given moment. 
Unlike the numeric models which have to analyze a broad 
spectrum of data to look for their correlations, the SDM, 
being a knowledge -based system knows at any given 
moment all the information that it will require to make its 
analysis. The SDM adjusts the discrete data filter so that the 
information content is maximized and superfluous data is 
eliminated. 

One of the common problems with monitoring real-time 
data streams is that the data often does not arrive in a 
consistent order. We have often found that related data from 
one source arrives long after its partnering data from another 
source. For example, you can be monitoring data from two 
different instruments. Each of these instruments can be 
producing their results at different data rates or, even worse, 
be delayed by some special processing. In a traditional 
expert system this would wreak havoc because they operate 
on data that is aligned on data frame boundaries. When data 
is skewed across multiple frames, then the expert system 
loses the operational context and gets confused because of 
conflicting contradictory data arriving. 

We eliminate this shortcoming by introducing a Discon- 
tinuous Temporal Ordering Event Detector (DTED) 1204. 
The DTED automatically derives temporal relationships 
from the knowledge base to allow data to arrive across 
multiple frame boundaries, i.e., their time tags do not exactly 
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match. This allows the SDM to delay its conclusions until all 
the data arrives and if the arrival of the skewed data would 
cause a change in its diagnosis, then those conclusions 
would be retracted before a false alarm or incorrect derived 
state is asserted. 5 

The SDM generates two primary kinds of results: derived 
states and discrepancies. To provide a uniform 
representation, we use the identical approach in performing 
each of these functions and they differ only in their knowl- 
edge bases that they use. 10 

Since two knowledge bases are being used, it is possible 
to generate erroneous results when instantaneous state 
changes occur and generate spikes in the data streams. To 
further eliminate false alarms we include an Event Correia- 
tor (EC) that matches the anomalies with state changes to 
filter out transient events that are not sustained across 
multiple frames. This provides for an even greater amount of 
insurance than when an anomaly is generated that it is in fact 
a real event and not transient phenomena. 2Q 

Predictive Comparison 

The Predictive Comparison (PC) component 214 shown 
in FIG. 13 compares the requested and commanded opera- 2 5 
tion of the system versus the sensed operation as interpreted 
from the time-varying quantities. Its goal is to detect mis- 
alignment between system software execution and system 
hardware operation. 

The PC combines the results from the numeric and 30 
symbolic engines and looks for confirmation and differences 
between them. It is the primary interface that merges the 
symbolic results for the system predicted state and explicit 
failures with the suspected bad channels from the dynamical 
invariant anomaly detector with the event signals from 35 
coherence -based fault detection algorithms. 

Its result is a sequence of confirmed predicted failures and 
detected unmodeled events. A failure is considered con- 
firmed when both the numeric and symbolic engines each 40 
predict the same failure or system state change. Unmodeled 
events are those cases where the numeric and symbolic 
engines differ in their conclusions. The capability of having 
parallel analysis engines running, each approaching the 
problem from an entirely different theoretical foundation, 45 
makes our system different and more powerful than the 
others. 

This module uses generic symbolic processing algorithms 
and does not require a knowledge base in order to perform 
its function. The following kinds of comparisons are made: 50 

1. Examines the system predicted state from the symbolic 
engine and correlates them to detected events from the 
numeric engine. If the numeric engine generates an 
event and it approximately correlates with a predicted 55 
state change, then the predicted state is considered 
confirmed. 

2. Examines the signals that are diagnosed as bad from the 
symbolic engine and correlates them with the suspected 
bad signals from the numeric engine and when there is 60 
agreement, then the channel is confirmed as bad. When 
there is a difference between the two, the signal is 
marked as an unmodeled event. 

The final component in the PC is to merge results from 
items 1 and 2 with the list of explicit failures and events so 65 
multiple redundant conclusions of bad signals and unmod- 
eled events are not generated. 
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Causal System Model 
Causal System Model 
Inputs 

Predicted failures from predictive comparison 

Unmodeled events from predictive comparison 

Redline estimates from prognostic assessment 

Discrete data 

Outputs 

Source Hypothesis 

The Causal System Model (CSM) shown in FIG. 14 is a 
connectivity matrix designed to improve source fault isola- 
tion and actor signal identification. In the SDM, the entire 
domain knowledge is represented as If-Then rules only. 
When the domain is very large and complex, an entirely 
rule-based representation and associated inference leads to a 
large and inefficient knowledge base causing very poor focus 
of attention. To eliminate such unwieldy knowledge bases in 
the SDM engine 204, we provide a causal system model. 
This allows the same problem to be simplified in the SDM 
by providing a component that automatically looks for 
relationships between observations in the data to fill in the 
blanks or gaps that are not explicitly provided from the 
SDM. 

The purpose of the Causal System Model (CSM) is to 
relate anomalous sensor readings to a functional unit of the 
system. If the anomaly corresponds to a known fault mode 
and/or shows up only in discrete data, we are confident 
which part of the system is functioning incorrectly. The 
more complex case is for novel, unmodeled events. Given 
the “unmodeled event” data, the goal is to identify which 
signals contribute to the event. The sensor signals are 
combined with faults that we know about, giving us a large 
collection of signals all taking part in the anomaly. Each 
signal originates somewhere in the system, so we have 
implicated a large number of components as potentially 
failed. But most of these are secondary effects. We need to 
find the root cause. 

This is accomplished by decomposing the problem into 
smaller modules called knowledge sources and by providing 
a dynamic and flexible knowledge application strategy. The 
same concept can be extended to problems requiring 
involvement of multiple agents representing different 
domains. 

The CSM reacts as and when conflicts arise during 
problem solving and uses conflict-resolution knowledge 
sources in an opportunistic manner. Essentially, the CSM 
provides a high-level abstraction of knowledge and solution 
and the derived relationships between observation and 
implication. 

The three basic components of the CSM are the knowl- 
edge sources, blackboard data structure and control. In the 
CSM, knowledge required to solve the problem is decom- 
posed into smaller independent knowledge sources. The 
knowledge sources are represented as SHINE If-Then rules. 
Each rule set or knowledge source contains knowledge for 
resolving one task in the diagnostic model. The blackboard 
holds the global data and the information on the problem- 
solving states. Activation of the knowledge sources modifies 
the states in the blackboard leading to an incremental causal 
relationship for actor signal identification. 

Since the causal relationship is decomposed into a hier- 
archical organization, the concept of an event becomes 
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predominant in this blackboard -centered strategy. Any 
change in the blackboard is considered an event. Any change 
in the solution state either due to generation of additional 
information or modification of existing information is imme- 
diately recorded. The execution controller notes this change 
and takes the necessary actions by invoking an appropriate 
knowledge source. This process repeats until the final causal 
relationship is obtained. 

Since the CSM is feed with a real-time stream of events 
(anomalies, suspected bad signals, events and unmodeled 
events), the arrival of a new event can make a previous 
concluded causal relationship incorrect. In such cases, cor- 
responding stages have to be undone by backtracking all the 
previously made assumptions leading to the reasoning to be 
non- mono tonic. This requires a dependency network to be 
incrementally maintained as the causal assumptions are 
generated using the knowledge sources. 

FIG. 14 shows a block diagram of an illustrative embodi- 
ment of this aspect of the invention. The processing block 
1402 bundles the conclusions from the symbolic and 
numeric components to create the complete list of affected 
signals. The block comprises two knowledge sources 1412 
and 1414. This can be done for diagnosis (the Predicted 
Failures path) or for prognosis (the Redline Estimate path). 
In the former, we are including signals that demonstrate 
explicit failures. In the latter, we include signals that are 
expected to cross redlines soon, as reported by the Prognos- 
tic Assessment module. The SHINE inference engine 1404 
relates the signals to functional blocks in the system. The 
cascading failures block 1406 backsolves through the sys- 
tem block representation to contrive a minimum hypothesis. 

Interpretation Layer 

The Interpretation Layer (IL) 218 of FIG. 15 collates 
observations from separate components and submits a single 
fault report in a format usable to recovery and planning 
components or to system operators. This is a knowledge- 
based component that is totally dependent upon the domain 
and the desired format of the output. 

As its inputs it accepts a list of events from the CSM and 
possible conclusions from the SDM as described above. Any 
supported conclusion that the SDM generates is considered 
a final output and is translated into the required output 
format. 

The CSM events can be decomposed into rule -based 
anomalies and detected events from the numeric engine. The 
interpretation layer performs a many-to-many mapping of 
faults (events) to interpretations. Each interpretation has a 
context associated with it. Because of this context, when 
multiple interpretations are generated within the same 
context, they can be grouped together as one interpretation 
containing several elements. This is typical of events in 
complex systems in general. 

Contexts are assigned by a contextual engine within the 
interpretation layer. Its purpose is to look for commonalties 
among each unique interpretation. In this manner if there is 
either a causal or interdependent relationship between 
interpretations, they are considered as possibly related. For 
example, if we have an alarm condition occurring on signals 
monitoring volts and/or amps, and the SDM concluded a 
fault based upon the number of watts generated, the engine 
will combine the volts and amps alarms with the watts 
conclusion. This provides for a very concise statement of the 
fault at hand without the user being deluged with disjointed 
information from many different sensors. 

The final reduced set of interpretations is processed by a 
component that reduces interpretations to conclusions. A 
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rule-based model is used to apply relationship definitions 
between interpretations, their causal relationships and their 
supported conclusion. For example, if the SDM did not 
generate a conclusion for watts being in alarm based upon 
5 the signals of volts and amps being overranged, then such a 
conclusion can be made here and generated as a final output. 

4.5.4 Implementation 

Detecting modeled and unmodeled events and system 
prognostics using real-time inputs makes use of multiple 
types of knowledge such as detection, diagnostic, simulation 
and causal modeling. To combine these different approaches 
heuristic, experiential knowledge is used to quickly isolate 
candidate faults and then use deeper causal model-based 
reasoning to analyze the problem in detail and eliminate 
incorrect hypotheses. 

The Symbolic Data Model is a knowledge -based system 
that provides a control structure that can easily switch 
20 between different types of knowledge and reasoning strate- 
gies. In addition, it provides multiple knowledge represen- 
tations suited to each type of knowledge employed and 
provides mechanisms to move easily from one representa- 
tion to the next during problem solving. 

25 Part of our system uses an approach based upon 
DRAPhys (Diagnostic Reasoning About Physical Systems) 
developed at NASA Langley Research Center. One advance- 
ment over the DRAPhys system is that we include 
knowledge-based modules for specific strategies of diagnos- 
30 tic reasoning that do not need tight coupling. This makes 
construction of the expert system much easier because 
software and domain knowledge can be reused for models of 
different hardware. Like DRAPhys system, we include a 
knowledge-based model that preprocesses the qualitative 
35 interpretation of sensor data prior to analysis. 

The input is quantitative sensor data from either a real- 
time data source or archived data. This can take the form of 
a real-time data stream, or a “beacon” approach using only 
significant events. 

40 The fault monitor compares the sensor data with the 
output of the quantitative model that simulates the normally 
functioning physical system. The monitor signals a fault 
when the expected system state derived from the system 
model differs from the actual system state. The expected 
45 behavior is a combination of engineering predictions from 
the Gray Box physical model representation and real-time 
sensor values. 

When a fault is detected, the monitor provides the diag- 
50 nostic process with a set of the abnormal sensor values in 
qualitative form, e.g., the symbol signal to noise ratio is 
exceeding predictions with respect to current ground system 
configuration, along with time tags to show the temporal 
ordering of symptoms. 

55 The diagnostic process is divided into several discrete 
stages and each stage has a unique diagnosis strategy. Each 
of these stages and their relationships to one another are 
described below. 

Stage 1: Diagnosis by Fault-Symptom Association 
60 The first stage utilizes heuristic, experiential knowledge 
generated from the expertise of engineers to compare fault 
symptoms with known fault types and failure modes. The 
most commonly occurring faults are diagnosed in this stage. 
However, the first stage will be unable to identify the cause 
65 of failures or predicted failures that are unusual or difficult 
to determine from the qualitative sensor information pro- 
vided. 
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Stage 2: Model-Based Diagnosis 

The second stage of the diagnostic process is a 
knowledge-based system that is based on a functional model 
of the underlying system. 

The purpose of this stage is to localize a fault by devising 
hypotheses based on how the effects of the fault would 
propagate through the subsystem. When this stage fails to 
identify a unique failure or maintenance prediction, then the 
third stage of diagnosis is entered. 

Stage 3: Numeric Analysis 

In this stage the system uses sensor data, results from 
software, and commands which are simultaneously fused in 
real-time to automatically abstract system physics and infor- 
mation invariants (constants). This makes the system ultra- 
sensitive to system degradation and change so that shifts can 
be isolated in time and space to specific sensors. The 
numeric analysis modules predict faults prior to loss of 
functionality. 

Stage 4: Interpretation Layer 

The numeric components excel at flagging individual or 
combined sensors that show abnormal values, but it does not 
relate the sensors to actual physical systems. The interpre- 
tation layer combines the numeric results with the symbolic 
results to form an integrated conclusion about the failure or 
set of failures in the system. 

When a unique diagnosis cannot be identified, the system 
will provide information about potentially failed functions in 
order to aid more efficient diagnosis or repair of the problem. 

4.5.5 Technical Perspective 

Our system is specifically designed for real-time fault 
detection and prognostics of modeled and unmodeled 
anomalies. It differs substantially from other systems that 
only identify the fault. The objective is to not only identify 
the fault from multiple stages of analysis, but to determine 
the effects that the failure will have on the functionality of 
the subsystem as a whole, and what remedial actions are 
most appropriate to correct the deficiency. Multiple levels of 
abstracted diagnostic capability applied to all affected sub- 
systems provides the ability to determine the overall effect 
of faults on system performance. 

Real-time performance affects the information available 
for diagnosis and the particular reasoning strategies that may 
be employed. One consideration is that a physical system’s 
behavior may change as time progresses while performing 
fault diagnosis. During diagnosis, failure effects may propa- 
gate to other functionally or physically connected sub- 
systems. This dynamically changes the failure symptoms 
with which the diagnosis system must reason. We intend to 
make use of this dynamically changing information about 
the system to identify the specific physical cause(s) of a 
failure, the fault type, responsibility and affected system 
components and the fault propagation history. 

Each stage of the diagnostic process utilizes the sequence 
of changing fault symptoms to focus the reasoning process 
and eliminate fault hypotheses. The first stage includes a 
rule-based system that includes temporal reasoning func- 
tions. This helps capture knowledge about changing symp- 
toms associated with specific failures. For example, when 
we have excessive phase noise in a receiver, then the system 
will observe the symptoms as, “The receiver goes in and out 
of lock . . . Using dynamic information early in the 
diagnostic process helps to distinguish among faults that 
may have the same initial symptoms but diverge in subse- 
quent behavior. For example, it could be an unexpected 
transient weather condition or a slowly degrading receiver or 
a drifting antenna error. 
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The functional and physical models used by the second 
stage can be thought of as a constraint-based dependency 
network. A functional interaction or physical adjacency 
between two subsystems is represented as a dependency 
5 association in the network. 

The diagnostic process in stage two attempts to map 
failure symptoms to specific components in the models and 
then determine the additional affected components by trac- 
ing through the network. The time-order of symptoms 
10 benefits this process by suggesting or confirming a sequence 
of components affected by the failure. 

The first component in a functionally connected sequence 
of components exhibiting failure symptoms is deduced to be 
the component responsible for the failure. A model of 
15 physical adjacency is used to resolve ambiguity, such as 
when a fault propagates physically between subsystems that 
are not functionally connected. 

The interpretation layer combines the results from the 
2Q numeric and symbolic engines and maps them to the physi- 
cal model for easy identification. It also takes collections of 
interrelated failures, e.g., a cascading chain of failures 
originating from a single point, and relates it to the single 
failing point rather than all the symptoms. 

25 What is claimed is: 

1. A method for diagnosis and prognosis of faults in a 
physical system comprising: 

providing sensor data representative of measurements 
made on the physical system, the measurements being 
30 representative of values of signals produced by the 
physical system; 

producing model enhanced sensor signals by fitting the 
sensor data to at least a partial physical model of the 
physical system; 

35 identifying correlated signals from among the sensor data; 

comparing the correlated signals with expected correlated 
signals to detect one or more occurrences of events, the 
expected correlated signals representative of known 

operating conditions of the physical system; 

40 

providing discrete data representative of system status 
variables and system command information; 

detecting discrepancies among the discrete data; 

identifying the one or more occurrences of events as 
45 unmodeled events based at least on the model enhanced 
sensor signals; and 

verifying faults based on the discrepancies among the 
discrete data. 

2. The method of claim 1 wherein the correlated signals 
50 are based on a coherence coefficient, (^-, defined by: 

ICo y(S h Sj)\ 

^ Max(V ar (S'; ), Var (Sj)) ’ 

55 

where 

S,- and S- are the signals produced by the physical 
system, 

60 l r 

Cov(Sj, Sj) = - I (Si -Si)(Sj-Sj)dt, and 

Var(^) = i l'(S ; -Si ) 2 dr. 

65 

3. The method of claim 1 further including performing a 
training sequence to produce the expected correlated signals. 
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4. The method of claim 1 further including identifying 
suspect bad signals by detecting discrepancies among the 
sensor data based on a statistical model of the sensor data, 
wherein the step of identifying the unmodeled events is 
based on the suspect bad signals in addition to the model 5 
enhanced sensor data. 

5. The method of claim 4 further including identifying 
statistical components of the sensor data, wherein the sta- 
tistical model is based only on the statistical components of 10 
the sensor data. 

6 . A system health monitor for diagnosis and prognosis of 
faults in a physical system being monitored comprising: 

a model filter having at least a partial model representa- 
tion of the physical system, the model filter operable to 
produce a plurality of model enhanced signals based on 
sensor data, the sensor data representative of measure- 
ments made on the physical system; 

a symbolic data model operable to produce predicted 20 
system states based on discrete data comprising system 
status variables and system command information, the 
symbolic data model further operable to detect discrep- 
ancies among the discrete data; ^ 

a first anomaly detector operable to identify unmodeled 
events by computing one or more coherence statistics 
from the sensor data and comparing the coherence 
statistics against expected coherence quantities indica- 
tive of known operating conditions of the physical 30 
system; 

a predictive comparator module operable to confirm a 
failure based on detected discrepancies among the 
discrete data, and to distinguish the unmodeled events 35 
from modeled events based at least on the model 
enhanced signals; 

a prognostic assessment module operable to produce 
predicted faults using a stochastic model of the sensor 
data to produce future values of the sensor data from 
the stochastic model; and 

a presentation module for presenting information relating 
to the health of the system comprising detected 
discrepancies, a categorization of modeled and unmod- 45 
eled events, and predicted faults, the information suit- 
able for a human user or a machine process. 

7. The system of claim 6 further including a second 
anomaly detector operable to detect discrepancies in the 
sensor data based on a statistical model of the sensor data, 50 
the discrepancies in the sensor data being identified as 
suspect bad signals, the predictive comparator further dis- 
tinguishing the unmodeled event from the modeled events 
based on the suspect bad signals. 

8 . The system of claim 6 further including a filter to 55 
identify deterministic components contained in the sensor 
data and to produce residual data from the sensor data that 

is absent the deterministic components; and a second 
anomaly detector operable to detect discrepancies in the 6Q 
sensor data based on a statistical model of the residual data, 
the discrepancies in the sensor data being identified as 
suspect bad signals, the predictive comparator further dis- 
tinguishing the unmodeled event from the modeled events 
based on the suspect bad signals. 65 

9. The system of claim 6 wherein the coherence statistics 
are based on a coherence coefficient, t, £ - 9 defined by: 


ICov (3,^)1 
Max(Var (Si), Var (Sj)) ’ 

where 

S- and S- are time-varying signals represented by the 
sensor data, 

Co VOS';, Sj) = jji (Si - SiXSj - S j)dt, and 
Var (Si)= y I* (Si -Sifc&t. 

10. A computer program product effective operating a 
computer system for diagnosis and prognosis of faults in a 
physical system comprising: 

computer-readable media; and 

computer-executable instructions recorded on the 
computer-readable media comprising: 
first executable program code effective to operate the 
computer system to receive sensor data representa- 
tive of measurements made on the physical system 
and to receive discrete data, the measurements rep- 
resentative of values of signals produced by the 
physical system, the discrete data representative of 
system status variables and system command infor- 
mation; 

second executable program code effective to operate 
the computer system to produce model enhanced 
sensor signals by fitting the sensor data to at least a 
partial physical model of the physical system; 
third executable program code effective to operate the 
computer system to identify correlated signals from 
among the sensor data; 

fourth executable program code effective to operate the 
computer system to compare the correlated signals 
with expected correlated signals to detect one or 
more occurrences of events, the expected correlated 
signals representative of known operating conditions 
of the physical system; and 
fifth executable program code effective to operate the 
computer system to identify the one or more occur- 
rences of events as unmodeled events based at least 
on the model enhanced sensor signals, to detect 
discrepancies among the discrete data, and to verify 
faults based on the discrepancies among the discrete 
data. 

11. The computer program product of claim 10 further 
including sixth executable program code effective to operate 
the computer system to identify suspect bad signals by 
detecting discrepancies among the sensor data based on a 
statistical model of the sensor data, wherein the unmodeled 
events are further based on the suspect bad signals. 

12. The computer program product of claim 11 wherein 
the sixth program code further includes program code for 
identifying statistical components of the sensor data, 
wherein the statistical model is based only on the statistical 
components of the sensor data. 

13. The computer program product of claim 10 wherein 
the correlated signals identified are based on a coherence 
coefficient, defined by: 



53 


US 6,625,569 B2 


^ Max(Var OS;), Var (Sj)) ’ 

where 

S t - and S- are the signals produced by the physical 
system, 

Coves';, Sj) = jJ ' 1 ( s > - SiXSj - Sj)dt, and 
Var(^) = jJ(Si -sfdt. 

14. A system health monitor for detecting anomalies in a 
physical system being monitored comprising: 

a model filter having at least a partial model representa- 
tion of the physical system, the model filter operable to 
produce a plurality of model enhanced signals based on 
sensor data, the sensor data representative of measure- 
ments made on the physical system; 
a symbolic data model operable to produce predicted 
system states based on discrete data comprising system 25 
status variables and system command information, the 
symbolic data model further operable to detect discrep- 
ancies among the discrete data; 
means for identifying correlated signals from the sensor 
data; 
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data store comprising a plurality of expected coherence 
quantities representative of known operating conditions 
of the physical system; 

means for selecting one or more of the expected coher- 
ence quantities based on the predicted system states; 
and 

means for identifying an unmodeled event by comparing 
the correlated signals against one or more selected 
expected coherence quantities, wherein the unmodeled 
event constitutes a detected anomaly. 

15. The system of claim 14 further including training 
means for producing the expected coherence quantities. 

16. The system of claim 14 wherein the correlated signals 
are identified based on a coherence coefficient, £■■, defined 
by: 

|Cov (.S',, y)| 

Max(Var (Si), Var (Sj)) ’ 

where 

S t - and Sj are time-varying signals represented by the 
sensor data, 

Co VOS'; , Sj) = (Si - Si)(Sj - S j)c£ t, and 

Var OS';) = y ^(Si -Si) 2 dr. 





